What I have learnt from Mind over Machine

From Site
Jump to: navigation, search

This is a book from 1986, by Hubert L. Dreyfus and Stuart E. Dreyfus. I have the revised paperback edition, dated 1988.

  • Expert systems experienced a hype bubble and its deflation in the mid-1980s. DM Data had an index of the top 30 companies, which for example, fell 26% in the second half of 1986.
  • Neural networks was seen as AI's new hope around that time, 1986-1988.
  • What is meant by a subsymbolic system? Is there more here than another name for neural networks?
  • Associationism apparently was some theory of intelligence which has failed. Without knowing what it is, I'm guessing that it oversimplified the way we use memory to give rise to intelligence. I don't think there is an easy answer to "name one aspect of intelligence which cannot be explained by using association in its various forms?". However, that doesn't solve anything. It's the details of how they work that are interesting and mysterious.
  • The commonsense-knowledge problem, the big one. The biggest chicken-and-egg of them all. How can you generalise properly? In a natural way? Write a general rule to generalise, and you will find that it's not general enough. It implies you need common sense. Which is a whole lot of rules, mostly implicit.
  • Commonsense knowledge has a lot to do with embodiment.
  • The commonsense knowledge problem has been a focus since the mid-1970s.

Contents

Aims of the book

This book is imbued with criticism. The book is a thoroughly negative attack on a young branch of science, which is still in the process of finding its feet. Instead of saying "let's change how we approach the problems", it is full of dissuasion: one of its aims is "clearing the air of false optimism and unrealistic expectations". Another quote: "... twenty-five years of artificial intelligence research has lived up to very few of its promises and has failed to yield any evidence that it ever will". Another quote: "... no significant progress in AI has been reported since the commonsense knowledge problem surfaced a decade ago..." The net effect of the book would be pressure on AI funding to decrease, which would be a frustrating slow-down in the advancement of the field. In the short term, due to the spin-offs, AI research is making a great impact on the world. It is also tremendously important in the long term. It will take several paradigm-shifting breakthroughs, but there is no doubt that eventually we will figure it out. Resources must be poured in, large-scale concerted projects must be launched, if breakthroughs are to be achieved. On the other hand, I highly doubt that any of today's favourite topics, like artificial neural networks and statistical models, will play a central role in the end. They may be used as tools or components. We haven't discovered the path to AI yet. It's not a matter of scaling up current techniques. So investment must be broad, and somehow made to favour the exploration of original ideas, rather than to sustain the continued chewing of stale ones.

My main attack on this book: it's full of pessimism, but it's not constructive. It claims that there are some things that computers will never be able to do. (Note that it sometimes uses the term "logic machine" interchangeably with "computer", even though it is aware that computers can be used in other ways, which is one of the things it is optimistic about.) Its arguments against logic machines are good. They show clearly what else we need to consider. They expose the essential differences between the capabilities of human cognition and the capabilities of logic machines. The book makes a strong case for the development of machines which learn autonomously, and evolve, rather than being hand-coded. So it's a useful book for AI researchers to read: it explores how some reasonable-looking ideas, upon closer examination, turn out to be dead ends.

Notes

  • The fulcrum of the whole book is the question: can human intelligence, whole or any given part of it, be decomposed and represented as as set of rules? (This is referred to as the Socratic assumption.)
    • Some philosophers believed that it can.
    • Others observed that human intelligence is knowing what to do in a given situation. But our ability to write down the instructions for what to do in any given situation is limited.
    • This is the contrast of knowing how and knowing that.
    • I think that theoretically it is possible to represent intelligence as a set of rules, but anything non-trivial will take a prohibitively extensive set of rules. In fact if you've been doing something new for just a week, it might take the rest of your life to write down (and debug, and rewrite until it's perfect) the rules of how to do it.
    • So the bottleneck seems to be the transfer of information. It's especially difficult, I find, to communicate spatial information, about the shape of complicated objects and things like that.
      • For instance, I'd have no trouble programming a robotic excavator to dig up the first scoop from a flat area of sand. But once there is a hole there and your program has to work with respect to the hole, and do so as effectively as a human who understands a lot about the shape of the hole, then we run into real difficulties, because we need a whole lot of vocabulary of shapes which is available inside a human mind, but not in any language.
    • So, there are the following problems:
      1. Lack of vocabulary.
      2. Lack of awareness of what rules you are actually following, in what order. (That is, if you set yourself the task of writing them down, you would be revising them many times over, as you observe yourself act in other scenarios.) Note that many things are implicit. For example, if you are faced with a certain object, let's say a book, this might activate a particular system in your brain (which might be simple) for how to pick it up. So no rule of the following sort actually gets run: "pick up the object", "which object is this?", "is it large?", "what is its aspect ratio?", and so on. Instead, there are probably some heuristics to recognise various useful descriptions of the object, and a few of those activate the procedures for picking up such an object. So you might think that you are well aware of how you do things, but I'm pretty sure you don't know how you recognise things, which is probably the largest part of a task like picking up a book. It's a vast collection of very simple programs. And an incredibly good system at choosing the right one.
      3. Even if the above two problems (lack of vocabulary and lack of awareness) didn't exist, you had the vocabulary and perfect awareness of how you do things, then the other problem is the time it will take you to communicate all that knowledge.
    • How do human beings deal with these? When teaching someone else some skill, we use a combination of language, for those things that are effectively communicated by language. For other things, we show how to do them, and let the learner copy them.
    • Therefore, we should try to build a machine which can learn by watching and copying.
    • That means we need a machine which can see and think about what it's seeing in ways as effective as we have. More importantly, similar to how we think about things. This should be the first step for AI. You might say: what about blind people? They do many things and learn by example all without seeing. Well, they have an excellent understanding of space, through touch and sound. Also, clearly, the difficulty of communicating certain types of knowledge is much greater, when we have to avoid communicating by showing.
  • The authors report that Leibniz invented binary numbers. No: he studied them. They have, of course, been invented time and again, so it's impossible to know who discovered them first. We can only cite earliest known records. Wikipedia says that it's not even clearly known which century BC they belong to!
  • Leibniz said that when going from theory to practice, many gaps are exposed in the theory. But that it's possible to write down "another theory more complex and particular". I think that it will soon become prohibitive, in terms of those three problems above, such that it will be more efficient for the learner to get the basics from the theory, and then figure out the little details themselves through practice and errors.
  • Pascal: "The heart has its reasons that reason does not know".
  • Stuart Dreyfus was working with Richard Bellman at the RAND Corporation programming JOHNNIAC, an early prototype computer designed by John von Neumann. They were sharing the computer with Allen Newell, Herbert Simon, and Cliff Shaw.
  • General Problem Solver was built around heuristics: shortcuts which don't always work.
  • The continuum hypothesis. Patrick Winston: "Just as the Wright brothers at Kitty Hawk in 1903 were on the right track to the 747's of today, so AI, in its attempt to formalise common-sense understanding, is on the way to fully intelligent machines".
  • Y. Bar-Hillel's fallacy of the successful first step.
  • Simon and Newell predicted in 1958 that within ten years a computer would be world chess champion.
  • Stuart Dreyfus: "There's no continuum. Current claims and hopes for progress in models for making computers intelligent are like the belief that someone climbing a tree is making progress toward reaching the moon."
    • I'd say that in hindsight, climbing trees might not have been that important for reaching the moon. But playing with burning chemicals was. Rocket fuel and metal canisters. You would have to be interested in various chemicals and especially those which react absorbing a large amount of energy, so that they can release it while burning. To find them, you would have to explore various sources, especially minerals. So to get to the moon, the best next step is to start digging a deep hole in the ground.
  • Hubert Dreyfus published "Alchemy and Artificial Intelligence" in the mid-1960s, which was regarded as an inconvenience by the RAND Corporation, and for which he was treated by AI researchers as an enemy, and avoided. (Quote: "... Weizenbaum was the only one in the M.I.T. AI laboratory who would speak to us after the publication of the RAND paper, ...".) I think that publishing that paper was like seeing a baby who is struggling to roll over, and reporting that it's not making much progress towards running. I think that AI research deserves funding, that the defense budget was their primary source of funding at the time, and by focusing on the optimism (which was real) they could maximise their funding, which was the right thing to do whichever way you look at it. Dreyfus, even though he's not wrong in what he says, exposes the pessimistic side of things (which coexists with optimism in any endeavour of uncertainty), thereby applying pressure on the funding to decrease, which means that it would have been diverted to other types of military research, like more effective killing machines. So I would tend to agree with the early AI pioneers in seeing Dreyfus's efforts as a nuisance.
  • In the abstract of "Alchemy and Artificial Intelligence", Drefus says: "... the attempt to analyze intelligent behaviour in digital computer language systematically excludes three fundamental human forms of processing (fringe consciousness, essence/accident discrimination, and ambiguity tolerance)."
  • In its evaluation of the Strategic Computing Program (Strategic Computing Initiative), the Office of Technology Assessment cautions: "Unlike the Manhattan Project or the Manned Moon Landing Mission, which were principally engineering problems, the success of the DARPA program requires basic scientific breakthroughs, neither the timing nor nature of which can be predicted".
    • This ten-year project ran with a billion-dollar budget from 1983 to 1993. Roland covers it in his book. The authors (Dreyfus and Dreyfus) warn against spending that amount of money this way. (The book was written at the beginning of the project.) Total funding was not cut, but funding for the AI components was drastically re-directed to other components mid-way into the project, because the AI progress was slow. The project has failed to achieve human-level AI, but more than paid for itself in the hardware advancements it stimulated, and in creating DART, a logistics planning software using techniques from AI.


The five-stage model of skill acquisition

  1. Novice. Uses context-free rules. (Things that always apply, and that can be easily written down in a book, or explained by diagrams.)
  2. Advanced beginner. Learns to recognise signs and situations that can only be learnt through experience. Example: student nurses learning the sound of breathing characteristic of a medical condition.
  3. Competent. Overwhelmed by the many factors that become apparent in a domain. Able to identify the few most important aspects of the context, ignore the unimportant, deliberate, and come up with plans.
  4. Proficient. "Involved". It feels as though the important aspects call attention to themselves, rather than the performer identifying and sorting them. Decisions are still carried out deliberatively (logically).
  5. Expert. Fully "involved". Decisions now also take care of themselves. No mental deliberation occurs. It's like moving from a discrete world into a smooth continuous world. Note: performing expertly still requires a lot of effort in some domains. But the expert has so much experience, that it's obvious what to do in any given situation within their domain of expertise.
  • The level assigned to a given performer is the one at which they produce their best performance. If a novice tries to think like an expert, then due to a lack of experience, they will fail (try to do a somersault if you haven't done any gymnastics before). A novice achieves the best performance possible for them when they apply context-free rules. Experience enables advancement to a higher level. Being a novice is the necessary scaffolding to reach further.
  • For different skills, these five layers have different thicknesses. Some skills are easier than others, and you can reach expert within weeks. (Not sure if much less time is enough for any skill!) For others, like chess, there are only a few dozen experts in the world.
  • Examples:
    • Most of us are experts at walking. We don't deliberate about moving individual muscles.
    • Most drivers with a few years of daily driving experience are expert drivers (I'm using the word "expert" in this technical sense.)
    • Most of us are expert readers.
    • For the three examples above, you might remember how long the learning process took. However, I can't seem to remember the individual stages — it was too long ago.
  • This all seems to be about the management of saliency: what features are the most important in any given context?
  • The common-sense problem seems to fit in as an example of a skill: the skill of living in the world. Or a bundle of skills. And most of us are expert or proficient at all of these skills. Otherwise we wouldn't survive that long. But each skill has its own domain, so if you start on a new skill, to an extent you're starting from scratch. (The extent is down to the foundation of the developed skills on which you draw. So, if you started a new sport, then you will use vision, movement, hearing.)
  • Other terminology for the difference between competent and proficient (which seems to be the most important barrier): knowing that and knowing how, respectively.
    • Knowing how is the same thing as what embodiment is all about, in the words of Brooks, using the world as its own model. There is no internal model, just associations of the situation the agent is in, and the actions that it should perform.
    • Knowing that is all about working with an internal model.
    • The brain needs both for intelligence.
  • A competent chess player (Stuart Dreyfus) finds grandmasters' games inscrutable.
  • A chess master can recognise an estimated 50,000 positions.
  • Experiment: pitted an International Master against a Master chess player, each having 5 seconds per move. While playing, the International Master was required to add up numbers mentally. Still performed exceptionally.
  • The authors attack Herbert Simon's theory of chunks (pattern-recognition) to explain the method of a proficient or expert performer. The argument is not convincing: the authors take the pattern-matching of chunks too literally. Clearly, no one knows what the chunks involved actually are, that's one of the crucial problems of intelligence: when are two situations functionally the same, and when are they different? I think that experience will wire chunks up in the right way, one pair at a time, and that they do not necessarily correspond to a literal pattern on a chessboard (where some out-of-the-way pawn doesn't make a difference), but to a function of the literal pattern.
  • When experts try to explain why they did something, they have difficulty, and sometimes they say that they feel like they are inventing the reasons they give.
  • The authors use the word holistic a lot. It's their way of asserting that certain actions cannot be broken down into smaller (meaningful!) components. (We can break things down into neurons, but they are likely not to be meaningful by themselves.)
  • The proficient and expert performers sometimes meet problems. They then deliberate, but they can work directly in mysterious holistic ways, mending them as a result of their deliberation.
  • Experts develop tunnel vision, where they are stuck on a particular railway track.

Visual and spatial intelligence

  • Shepard and Metzler's 1971 experiments have shown that mental rotation of objects takes time proportional to the amount of rotation, which indicates that humans do mental rotation spatially somehow. (What could the representation be???)
  • The straightforward engineering representation of objects as points, lines, and faces, is completely different from the representation of the same scene that a human being uses. Humans don't know the exact dimensions, but they are able to represent all of the semantic aspects of objects in an elastic, watertight way.
  • About face-recognition, the authors say: "Two faces might appear to be alike because both have gentle, mocking, or puzzled expressions. Recognizing that does not involve finding certain features they share. Indeed, there is no reason to think that expressions have any elementary features." I disagree. I think that expressions are made up of features, but these features are not easy to describe.
  • Hofstadter is quoted about the shape of letter 'A': 'Nobody an possess the "secret recipe" from which all the (infinitely many) members of a category such as "A" can in theory be generated. In fact, my claim is that no such recipe exists.' I would tend to agree, but generation is not recognition. I believe that there definitely is a recipe for recognising letter "A", and that it's practically implemented in at least a billion devices on the planet, I mean human brains.

Holograms are holistic

  • The authors use holograms as an example of holistic devices. The image is formed by the interference of wavefronts from many areas of the hologram's surface. So a picture of a table is not stored in any small part of a hologram, it is stored effectively in all areas of the hologram. If we cut the hologram into two, then each half gives the whole image, except that it is more blurry.
  • Holograms can be used as associative memory in two interesting ways:
    • Projecting one image, we will see the other one, when both have been encoded into the hologram.
    • Suppose a hologram is made of a page of the book, and another hologram of the letter 'F'. Then if both holograms are used together, a we will get a dark area with bright spots everywhere on the page where the letter 'F' appears.
  • However, the authors note that humans can recognise similarities despite major changes (for faces: expression, hats, glasses, hairstyle). So maybe they work like holograms, but indirectly with some invariant features.
  • That kind of a capability could be useful to have, and networks of neurons might very well be able to implement a hologram-like device. So maybe that's one of the ways that the brain does similarity detection so quickly.

You can divide and conquer only so far

  • We can divide most machines made by humans into components, and then each of the components that we can't divide is a simple machine that we can analyse. It could be a button: it's made up of an electrical contact, a spring, and several solid parts with known densities, elasticities, and other mechanical properties. We know how all of these simple components behave. We can decompose a hologram into a laser, lenses, and mirrors. But then we still have the holographic plate and the interference pattern, which is an atomic component, and we'll have to analyse it as a whole, using the laws of physics directly. Parts of the mind might be like this too. Say, the processing done by a given topographic map in a region of a brain, might be described by a model, but not broken down further into components.
  • Quote: "Even if, at some deep unconscious level, the brain does match typical cases by using some subtle features no one has ever dreamed of, that is no help to the programmer, whose only way of finding such features other than sheer luck has to be his own introspection and the observation and interrogation of experts." That's right, but there are ways to parallelise and precipitate sheer luck.
  • Heuristic. When problem-solving a soft problem, you might divide it very well into components, and then meet difficulty. In that case, try to write a small holistic system, and evolve it! For example, in OCR, the approach is often to divide words into letters and then the letters are analysed using neural networks, which are holistic systems. (It's likely that it's possible to do better than applying a neural network directly to pixels.)

Being aware of what we don't know

  • This is an interesting sub-problem in AI: how is it that you are instantly aware that you don't know something, like the phone number of Karl Pribram? The way I think my brain does it is: I know only very few phone numbers, and they are all of people I've met, and I've never met Karl Pribram, so I don't know his phone number. It seems to take slightly more time with birthdays, because I know many birthdays of people I've never met. Note, that in this case I might say "I don't know Newton's birthday", and then "oh wait, I know it, because it was Christmas Day!", so sometimes the connection is not immediately obvious.
  • Anyway, this is a reminder to use representations which allow you to declare absence as quickly as presence of a fact.


Anti-caution against working with specific data

  • Minsky says that it may seem wrong to work with a particular thesaurus-like classification of knowledge, but it's not actually. I think that the path towards generality and abstractness lies through many specific and particular items.

Micro-worlds

  • Winograd's SHRDLU (the "blocks world") is a neat piece of work. About 40 years later it still amazes me what it can do.
  • I think that the idea of micro-worlds is a fundamental idea for intelligence. Everything that we do actually takes place in one of the micro-worlds that every domain, where intelligence works, is composed of. Note that the tasks that we do are generally quite simple: take one object from place A to place B, for example. The difficult part is how do we keep track of which micro-world applies when and where and in what situation?
  • But then something weird happens. Minsky and Papert examine the micro-world of bargaining, and list about 50 different things, like "plan", "children", "conformity", "words", "houses", that are primitives that we would need to work in the micro-world of bargaining (and those 50 are just a sample). But when we think in terms of bargaining, we don't need to understand all these other things. We would only need to understand the process of exchange, the time sequence of events in it and actions that we can do, and know a set of objects and their value. Value is a complicated issue, but outside of the micro-world of bargaining. All this micro-world needs is the existence of a function which would, given two items, tell us which one was more valuable, and, qualitatively, by how much.
  • We are only conscious of very few things at any one time. This is our micro-world.
  • When in a given micro-world, we abstract. In other words, represent something else we know by a symbol, which doesn't expand into its meaning unless we ask it to.
  • The authors assert that every real situation (what they refer to as a world) presupposes the whole of commonsense knowledge. However, I think that it's not like that. Commonsense knowledge is composed entirely of micro-worlds, tightly connected together, so that we move among them fluidly. If you think that you can keep a lot in your working memory at the same time, that's just an illusion.

The problem of changing relevance

  • This is a problem of keeping track of which facts (from a huge amount of knowledge that you have) apply in any given situation.
  • This is the frame problem.
  • Suppose we have a list of things we know about a situation. When something changes, how do we update this list? Go through each thing and update it separately? That would be inefficient.
  • Statement: "the problem of finding a representational form permitting a changing, complex world to be efficiently and adequately represented".
  • Solutions: Minsky's frames, and Schank's scripts. They talk about expectations in a normal situation. For example: going to a restaurant.
  • The problem, according to Dreyfus and Dreyfus: there are some core expectations, and many more expectations that are still relevant, but occur only sometimes. Like seeing someone you know. Thousands of these. If we keep track of all that, the complexity increases, because we would have to check all these cases. But being human, it doesn't feel like that at all.
  • I think that expectations are important, but it is also true that we can deal with a set of constraints that we have never dealt with before. So we cannot pre-program everything. We also don't have to learn each situation by trial-and-error.
  • The fluidity of our everyday experience, that the authors keep referring to, I think, is only a sort of an illusion. Just like the fluidity of water, when it is quite grainy at the nano-scale. But with our everyday experience, you don't need to zoom in that far. I think that, say, driving on a highway, and encountering a sign, is composed of maybe five discrete representations, which we switch between in a hard and crisp way. We don't notice the switch event (except sometimes?), because there have to be some things we don't notice. Otherwise we will notice noticing, and then we will notice the coincidence of noticing one thing and another, and then notice noticing the coincidence, and the whole thing explodes exponentially.

The gestalt problem

How do we recognise the similarity of an aspect of a situation to an aspect that we've met before (and know how to deal with)?

The authors assert that this is done in a holistic way, not by breaking down into features and comparing feature-wise.

Early holistic work

  • Holographic. Convolution, interference pattern.
  • Neural networks. How to encode the new information without erasing the earlier-acquired information? Again the problem of appropriate generalisation here.
    • McClelland and Rumelhart's neural net learnt the past tenses of English verbs from repeated examples without being given or creating any rules.
  • Feldman's spreading activation.
  • Hofstadter's strange loops and tangled hierarchies, funnelling. Hofstadter sees symbols as necessary, while the authors think that signatures are how the brain works with similarity. I think these are more or less the same. Either way, it is a subnetwork of neurons that are strongly connected to each other.
    • What happens when we look at a painting where we don't instantly recognise what is painted? And then we do. What sort of activity goes on in the brain?
    • Generalised idea of perspective: it is an interpretation. So, if we look at a line drawing of a cube or an Escher picture, then we can adopt one perspective or another.
  • Minsky writes: "So we shall view memories as entities that predispose the mind to deal with new situations in old, remembered ways—specifically, as entities that reset the states of parts of the nervous system."

The authors assert that face-detection proceeds without the detection of features.

Expert systems

  • It is practically impossible to express an expert's knowledge as a manageable set of rules.
  • The prophet Euthyphro tells Socrates examples of piety, but cannot state the rules. Similar stories with other experts. Socrates concludes that he knows nothing.
    • Plato says that experts once knew the rules they used, but had since forgotten them.
  • Feigenbaum: "... knowledge threatens to become ten thousand special cases".
  • Arthur Samuel's checkers program. "The experts do not know enough about the mental processes involved in playing the game."
  • The authors believe that the experts are not following any rules. They are, instead, recognising special cases.
  • Pribram says: "One takes the whole constellation of a situation, correlates it, and out of that correlation emerges the correct response." It falls out. That's why it happens so fast.
  • The successful expert systems at the time, as surveyed by the authors, are all from domains where the human practitioners had to rely on systematic logical inference, rather than pattern recognition.
  • In advanced chess, at the highest levels, the players do not consider millions of possible moves, but only consider a few that are the most plausible or favourable. How do they know, if it's clear that they haven't encountered that specific position ever before?
    • An insight is that they often have a strategy: a goal that they want to achieve. Whereas a machine will look at millions of moves to a certain depth, without being biased by any goal.
  • Experts rated 70% of MYCIN's recommendations as acceptable.
    • Another example from medicine is RECONSIDER, which gives the doctor alternative hypotheses which might have been missed.
    • Such systems are not good on their own, but useful additions to the arsenal of the doctor.
    • System of this sort can deal with about 70% of routine work. They print nice reports. An expert can quickly check the report and, if they agree with it, can save time by accepting it.
  • A domain is right for an expert system if somewhat inferior performance is acceptable, but non-expert performance is not.

Computers used for education

  • Research by Elton Mayo: lighting in a factory was changed, and performance improved. Then they changed it back to normal, and performance improved still further.
    • Increases in attention obscure the actual effect.
  • Generally, this question of the efficacy of software for education, has to be approached with extreme care on a case-by-case basis. Trial and error and insight into what actually happens, educationally. Statistics alone are not enough.
  • Computers should be used as tool and for conducting drills, like spelling and solving a long list of routine problems in mathematics.
  • Micro-worlds for cognitive development.
    • Example: Robot Odyssey I. The objective is to get out of Robotropolis by building a robot.
  • Experiments show that we are able to recognise words faster than we are able to recognise letters.
  • The Geometric Supposer can be used to build any construction on any triangle. So it aids the exploration of how properties of such constructions tied to properties of the triangles.
  • "It isn't easy to wreck a nice beach."
  • Instructor pilots teach trainee pilots how to scan instruments. In testing, it turned out that they are not following the rules that they are teaching. They scanned the instruments in a flexible and situationally appropriate ways.
    • The authors argue that this is a superior way of doing things, and that it's important that computer aided instruction devices do not interfere as a student passes from rule-following to this kind of flexibility.
  • Whatever the educational system, we will grow out of it sooner or later.
  • Experiment by Lee Brooks. Two complicated artificial grammars, generating two sets of strings. Two groups of participants, both shown the two lists, but given different tasks. The first group: abstract the two sets of rules from the two sets of examples. The second group: given the two lists, and additional information designed to prevent them from thinking that all the items on either of the lists fell into a single category. The first group were unable to abstract the rules. The second group could classify further examples given to them into three groups: the two from before, plus one that was "unrelated". They could do it at a level well above chance.
    • I wonder if the first group could also classify them well, given that they have studied them for some time. Further, I wonder if there were some basic features ("texture" in the patterns, for example if one always produces letters in pairs, such as aabbaacc, and another never) that allowed classification without being able to abstract the entire set of generator rules.
  • Using the computer as a tutee. I disagree with the authors: within their own framework, thinking from another's point of view holistically should develop very important skills in teamwork, leadership, and education. They seem to imply that using a computer as a tutee is restricted to verbalisation and rules.

Other interesting things in the book

  • In an experiment, participants are asked to estimate, given the following facts: 85% of taxis in a city are blue, and the rest are green. A witness can tell the colour of a taxi with 80% accuracy, and they report seeing a green one. What is the probability it was actually green? Most people intuitively say that it is around 80%. But in actual fact it is 41.4% (Bayes' law).
  • Another experiment: it is known that one bag contains 70 red chips and 30 blue, and another 30 red and 70 blue. One bag is sampled at random. Ten chips are taken out and replaced, out of them 6 were red and 4 blue. What are the chances that the first bag was picked? The actual answer is that the odds are better than 5 to 1. Usually, experiment participants estimate much closer to 50-50.
  • Note: they have never experienced this kind of a task before, but just made a good guess. With experience, they would surely become much better.
  • Research by Amos Tversky showed that the wording of experiments similar to this, can affect the outcome significantly. Especially whether negatives are used or not. (The underlying mathematical description of the situation is kept precisely the same.)
  • Aristotle seems to have thought that before one could act, one had to deduce one's actions from one's desires and beliefs, logically. If G is the goal, and I believe action A will bring about G, then I should do A. I think that's right, and has nothing to do with skills. It's about choosing what skills to use (because it's for choosing what actions to take).
  • Aristotle's definition of man is zōion logon echon, the animal equipped with logos. This logos is a more general thinking than just logical thinking.


  • One study analysed the performance of graduate students. It turned out that the four-professor admissions committee made poorer predictions of their performance than a simple statistical model using only three variables: an examination score, GPA, and a subjective assessment of the quality of the undergraduate school they have come from.

References

See the book.