Executive function tests: a short introduction
Executive function tests are a very special kind of neuropsychological tests. They pretend to measure executive functions or problem solving skills. However, this is not so easy as one might think. That’s probably the reason there are not that many executive function tests around.
An executive function test should measure executive functions, that is: planning ahead, logical thinking, acting in accordance with hypotheses, checking one’s behavior (self-monitoring) and flexibly changing one’s hypotheses or actions. With this definition you can easily see that executive functions are always active, no matter what neuropsychological test is being administered. So this makes it really difficult to assess executive functions. When a test is measuring all executive functions, it is usually quite difficult to find out what parts of executive function are damaged. The most common approach then is to develop multi-faceted tests in which subtests assess different executive functions.
By definition, executive functions are usually called upon when solving a problem and that usually takes place whenever you encounter new problems. Old and familiar problems are normally routinely solved because you already know what to do. Just as with a small puzzle you already reached a solution and know how to solve it. One of the main problems with executive tests is that the moment you know the solution, the test can hardly be administered a second time. Just because you already know how it should be done. In such cases the test does not measure any form of executive function anymore. This is one of the main reasons the test-retest reliability of executive tests is very poor.
On this page I would like to discuss several executive tests that are well-known and very much used: the Tower of London Test (TOL), the Wisconsin Card Sorting Test (WCST) and the BADS (Behavioral Assessment of the Dysexecutive Syndrome). I will also discuss other tests that are extremely useful in assessing executive functions but are usually not used for that.
A review of some executive function tests
As I told you earlier, I can not discuss all available executive function tests used in neuropsychology. Therefore, I would like to review tests which in my opinion should be used the most because of their proven ability to measure executive function deficits in patients and normal (healthy) persons.
Tower of London Test (TLT or TOL)
To be honest here, I am a bit biased because this test was developed and computerized by myself and it is now published by my own company Pyramid Productions. Nevertheless, it has several qualities worth mentioning and is therefore used as an example of a fine test. See Figure here to see what test stimulus is like on the computer screen:
It is a 12-minute computerized planning test in which you have to move small blocks (a red, yellow and blue one) on 3 different pegs. However, there are several rules to move the blocks and that makes this task difficult. You have to look carefully and plan before you do your moves. Therefore this TOL test requires quite a lot of planning (keeping several plans in your working memory and flexibly chosing between them), some self-monitoring (checking what moves you are making according to your plan), and self-correction (noticing when your move is wrong and correcting this error). Of course, such a task also requires a good visual perception, a sufficiently large working memory and quite some concentration and divided attention (attentional control).
The TLT (copyright Pyramid Productions) originally has been developed by Tim Shallice, a researcher in Cambridge. Then it was a wooden pegs edition, not computerized yet. My computerized edition strongly follows his original task and I have added a more useful scoring system. This test is mostly evaluated in a very positive way by patients as being fun to do, relatively easy and not too long. Clinicians love the computerized edition because of the ease of administration so that they can observe the patient more fully. This version of the TLT has reasonable norms for healthy people (n=260) and the largest norm group for neurological patients in an executive function test yet (559 stroke patients, 99 TBI patients and 254 other neurological disorders including hypoxia victims, multiple sclerosis and epilepsy). The TLT only requires a Windows based computer and can be used both stand-alone as in a network.
Psychometric characteristics of the TLT have been judged by the Dutch Test Committee (COTAN) to be insufficient, referring to the norms and the reliability. Content validity actually was sufficient. It does not come as a surprise that I disagree with the verdict on the norms. Most executive function tests are normed poorly and you can see for yourself that the TLT has relatively good norms. The manual is only in Dutch but will be translated in English, probably released this year. It will become freely available on this page and on my www.pyramidproductions.nl page as well so everyone can have his own judgment.
The reliability of the main indice TLTscore range from .58 to .66 (test-retest reliability) which is only fair, considering it is an executive function test. The classic AO1 indice has a lower reliability ranging between .48 and .53. When looking at split-half reliability and Cronbach’s alpha, these reliability coefficients are much too low, ranging from .11 to .36. This is a clear indication that in an executive function test the subitems are hardly related and can not be easily compared with each other. Due to the main assessment goal of an executive function test, this does not come as a surprise.
The convergent validity is fair, ranging from .38 to .43. The divergent validity ranges from .12 to .28. The problem here is that in this study there was only one true executive function test that was comparable to the TLT: the WAIS-R Picture Arrangement task.
Large advantages of the computerized TLT are that it is fully standardized with clear and short instructions to administer. The computer perfectly records all responses and reliably calculates the two main indices.
The TLT is sufficiently patient friendly meaning that all patients can do the test without becoming too much frustrated or irritated. The TLT can be administered to one-handed patients and a large benefit is that it can be used in patients with aphasia as well.
The costs of the TLT are a bit lower than other executive function tests: 160 euro ex. VAT for 1 year license on 2 computers. License renewals are 160 euro ex. VAT. When sufficiently popular the prices will go down to ensure worldwide use and distribution.
Preliminary analysis of the sensitivity and specificity of the TLT shows good values, respectively 94.7% and 90.2%. However, this analysis has been done by using another variable in the test and relying on a very strict cut-off point. Therefore, the sensitivity and specificity numbers can be considered to inflated and must be seen as an indication. Further research is necessary here but won’t be easy because there is no ‘gold standard’ test for planning skills. Luckily, Unterrainer et al. (2005) showed in a nice study of chess players that the TLT had strong correlations with chess. In this way demonstrating that the construct validity of the TLT is quite sufficient: the TLT does indeed measure planning skills.
The TLT currently is only available in Dutch and can be downloaded from www.pyramidproductions.nl. An English version will become available at the end of 2011 with an English manual as well.
Below you see an Evaluation Table for the TLT. Using this Table every neuropsychological test can be evaluated. See for more information about the criteria of evaluation my page on Test-Psychology. See this link:
Go to Test-Psychology
WAIS-R Picture Arrangement
The Picture Arrangement (PA) sub-test of the Wechsler Adult Intelligence Scales (WAIS) is not really known as an executive test. In the ‘bible’ of neuropsychological diagnostics, Lezak’s Neuropsychological Assessment (2004), this test is never described as an executive function test. In the other bible, A Compendium of Neuropsychological Tests (Strauss, Sherman, & Spreen, 2006), it is even never mentioned separately! In the Dutch guide for diagnosing executive functions in which I played a minor role Picture Arrangement is mentioned as one of the many tests to specifically assess conceptual reasoning skills, as part of the executive functions. In the international literature, to my knowledge, Picture Arrangement is almost never mentioned as an executive function test. And that is at least astonishing, because…it is one of the finest tests of executive function around!
This failure of not interpreting PA as a very fine example of an executive function test is exemplary of the sorry state of affairs neuropsychology is in when you critically consider its arsenal of tests. Not only we still do not have many very good neuropsychological tests (there is a lot to be concerned about and many tests need serious improvements), but scientists and clinicians differ considerably in how to interpret the same neuropsychological test, being either an attention test or more of an executive function test. In scientific articles researchers usually echo each other in what they are saying just to be sure to get their article published. Independent critical thinking is rarely seen. That’s one of the main reasons that new innovative breakthrough kind of neuropsychological tests are rare.
I do not want to be extremely critical, that’s destructive talk and doesn’t help anyone. More constructive is to explain in detail why such a test like Picture Arrangement can be (and IS) such a fine executive function test. First of all, the skills for correctly doing this test are all part of what scientists call ‘executive functions’. Remember, executive functions are really several different cognitive functions all humped together: concept formation, formulating a plan (planning), formulating a goal, sequencing the correct order of steps to take in order to reach a goal or follow a plan (logical reasoning), executing the steps and monitoring your own actions, mental flexibility to reformulate a plan and change the actions to reach your goal/plan and the ability to control your automatic, instinctive or impulsive reactions in order to follow your action plan consistently. In short, executive functions are functions that represent goal-directed actions: taking initiative, planning, executing actions, monitoring and self-correcting those actions.
And Picture Arrangement has it all. It is a test in which you see for example 5 black-and-white line drawings of cartoon pictures. All are shown in a wrong order and you have to make up a logical story for these 5 pictures ánd place them in the (one) correct order. I can not give you a specific example of this test of course because that is against copyright rules ánd this kind of knowledge can influence your test score whenever you are confronted with this test (normally only when you have had a head injury, so let’s not hope so). However, I can give you an example with a cartoon I borrowed of Zack Thomack, from his website www.polaroppositescomic.com, with many thanks to him. See the Figure below.
I deleted the text (which was quite amusing by the way) and now your task is to give the right and ony correct order of the three pictures. Without much effort you will quickly notice that only the order 2-1-3 is the correct one. You can verify this on Zack’s website here: www.polaroppositescomic.com (it’s worth looking around).
This seems like an easy task and normally that is indeed the case. It IS an easy task. The same goes for the original Picture Arrangement items as well. Especially when you as a clinician or researcher get rid of the ambiguous items (the ones where you can have more than one logical order of the pictures and have more than one logical story). I always use only 7 items of the original WAIS-R PA: House, Romeo, Louie, Enter, Hunt, Hill, and Robber because these are the unambiguous items.
The task requires several cognitive processes. First, you have to see all pictures in clear detail. Then you have to form several ideas about what is going on here. That’s what neuropsychologists call ‘concept formation’ or ‘formulating a plan’. Then you have to form a picture in your mind of a sequence of the 3 pictures. Not randomly, but guided by your plan or idea about what is going on in this story. Then you have to put these pictures in this planned order and finally you have to check this whether it matches your plan/ideas. When the sequence is looking wrong, you will have to correct the order. Remember, in the PA test all pictures are separate pictures you can move around. I have programmed these pictures so that they are displayed on a computer screen and can be moved by the clinician. The patient has only to point at a picture and to point at where it should go. Accept for planning, sequencing and monitoring cognitive processes like visual perception, divided attention and memory all work together to do this task correctly.
Of course, this example given was a simple one. There are more items and it becomes more difficult meaning that you have to make up more different ideas for how the story really goes. Then you have to match this with the correct sequences according to your ideas. Finally, you will find only one correct sequence and one correct story. Failing to do so can have several reasons: a. not making up the correct plan, b. not using the correct logical reasoning, c. not checking carefully all details against your ideas, d. not being able to hold several different story lines all at once in your attentional span (working memory). As you have figured out by now, all executive functions are tested with this simple task. And indeed, in my clinical experience (I have administered this task in about 2500 patients by now) it is a very sensitive task indeed to detect brain injury, especially right hemisphere damage and errors in logical reasoning and problems with insightful reasoning. Of course, you have to take into account that low intellectual capacities do also lead to poor performances on this test. But then again: poor intellectual capacity is another way of saying that there are serious problems in problem solving, in executive functions.
Unfortunately, the psychometric test characteristics of such a test like PA are not that good. Only to be understood because it IS an executive function test that suffers from a relatively poor test-retest reliability. In Lezak (2004) she notes reliability coefficients ranging from .66 to .82. Furthermore, significant practice effects hinder the repeated administration of this test, just as with any other executive function test.
However, despite the mentioned disadvantages (psychometrically spoken) of the PA, I would like to call upon all clinicians and scientists to use this test much more. Especially in the way I have programmed this test, on a computer screen, because only then it resembles very much a planning task like the Tower of London Test. Indeed, in my normative study (see the Manual of the Tower of London Test) I could find a correlation coefficient between TLT and PA of .43 in a sample of n=232 neurological (mostly stroke) patients. A factor-analysis using only 79 neurological patients showed a factor that was represented fairly well by the TLT, PA and another logical reasoning and planning test, the Wisconsin Card Sorting Test (although this last one loaded much lower than the other two tests).
Below you will find the evaluation Table of the Picture Arrangement test for the paper and pencil version because this version is the most widely used and available.
Wisconsin Card Sorting Test (WCST)
One of the most used neuropsychological tests is the Wisconsin Card Sorting Test. I do not want to imply here that I am very fond of this test, just because a lot of scientists and clinicians use this test. I want to explain here why it still is a very interesting test to use but you have to be very careful in interpreting its results. Let me first tell you what kind of test it is.
There are four stimulus cards, placed in front of the subject, each having different symbols (1 to 4, triangle, star, cross, or circle). No two cards are identical and there are 64 cards (original version of 1948 had 128 cards, more identical cards). The figure below (from Wikipedia.com) shows some stimulus cards.
Unfortunately, on the Internet you can find very detailed descriptions about the test and the strategy that is being used to administer the test. As I said in my introduction earlier I do not want to give so much information about a neuropsychological test that a patient can prepare himself for a test administration. Everyone publishing such kind of information on the Internet can face prosecution by large test publishers and rightly so. Because it can seriously hinder good neuropsychological assessment.
The WCST originally was developed in 1948 to study ‘abstraction ability’ and ‘the ability to shift cognitive strategies’. It is still considered to be an executive function test and yes it is. To perform correctly in this test you have to plan, execute your plans, monitor your actions and feedback, and change your hypothesis (plan). In fact, it is a very simple test but astonishingly, a lot of healthy students perform rather disappointingly as well. Just as a lot of patients. So, in theory the test seems very easy but in clinical practice you see quite a lot of patients who score very poorly on this test. That is why the WCST should never be used alone: it has to be administered with other executive tests as well. This advice applies to all neuropsychological tests, I know, but it is extremely valid for executive tests. One of the largest problems with executive tests is that they depend on only a few several cognitive strategies. If you know these few strategies, you will perform excellent on such a test. That’s also the reason that most executive tests can not be administered twice: once a patient knows the strategy (i.e. how to do the test), he performs very good on the test. In other words, then you have a ceiling effect and the test can not be used as a diagnostic instrument.
So why am I still using the WCST in my daily practice as a clinical neuropsychologist, even though there are a lot of normal healthy persons who can not do this test adequately? Well, it really tells you something about executive functions, and especially one aspect of executive functioning. Namely, the ability to flexibly and logically think things through. In fact, this is the planning aspect of executive functions. In order to correctly perform in the WCST you have to look carefully to the feedback that is given after every single step you take and you have to take the initiative to logically form several hypotheses and then test each hypothesis consequently (according to your plan/hypothesis). If you do not do that, you don’t ‘stop and think’, then you will perform poorly on this test. The ability for self-correction= changing your course of action in the light of correcting feedback is very clearly tested in the WCST. Much more challenged than in any other executive test I know. Thát’s probably the reason the WCST is still being used today, even after more than 60 years. Unfortunately, not every clinician or scientist does seem to be really aware of this fundamental aspect of the test.
Another fundamental aspect of the WCST is highly related to the above mentioned changing your actions: sufficient focus on the right hypothesis. By that I mean that a logical hypothesis should be maintained long enough in working memory to put it to the test. Research by Barceló for example showed that patients really had trouble in maintaining their hypothesis (set) in working memory, either due to interference from other cards (distraction) or other reasons yet unknown. In other words: a low performance on the WCST is not just simply a real perseverative symptom (unless the number of Perseverative Erros is very high), but can also be related to simply a very weak concentration or executive attentional control (cognitive control).
There are several versions of the WCST and that makes comparison of scientific studies quite difficult. There is the original 128 cards version in which there are more similar cards. Heaton (1981) standardized the test instructions and scoring procedures and later refined this in the 1993 manual. He was the first to really publish the task as a clinical instrument. I myself use the shortened 64 card version and have programmed the Milwaukee version of Osmon & Suchy(1996). However, the Milwaukee version in which patients had to tell beforehand why they chose a specific card, seemed promising but could not make much difference in clinical practice so I discontinued using it.
Because the paper and pencil version of the WCST is quite difficult to score correctly, even in very experienced clinicians, so it is really advisable to use the computerized editions of the WCST. Furthermore, using the computer total test time is seriously reduced.
The norms of the WCST are available but they are from 1993, so quite old. N=899 American population ranging 6 to 89 years. However, there is also a relatively recent meta-analysis of 34 studies consisting of about 3000 healthy adults. The two main variables of the WCST are Number of Categories achieved (range: 0-6) and Number of Perseverative Errors. The meta-analysis shows an age-effect on the WCST: 55 years or older have a Mean Categories of 3.99 (SD=1.83) and 20-35 yrs have a mean score of 5.58 (SD=1.1) (Strauss et al., 2006). The most important and statistically most usable variable is the Number of Perseverative Errors. The older age group had a mean of 15.85 errors (SD=11.44) and the younger group a mean of 6.92 (SD=5.04). Heaton’s norms are less stringent and talk about an average of 11 perseverative errors as being average in a young group, and 21 in an older (70-74 yrs) group. This already shows that even ‘normal’ healthy controls do show quite some perseverative errors on the WCST. Furthermore, the higher age groups perform less well on the WCST and I do NOT think this should be considered normal. To be precise: although it may seem average for an older age group, a larger number of errors in such a very simple task does mean that the flexibility in changing cognitive sets is less than in younger adults. As a neuropsychologist I would not simply conclude that this is normal, although it may be average compared to older adults.
Other norms are available for healthy Canadian children (n=685), aged 9 to 14 years, more preferable than the Heaton norms because of a higher representative nature and more recent (1996; for the norms see Strauss, Sherman, & Spreen). A major problem with all these norms is that they are valid for the paper and pencil version of the 128 cards version of the WCST but not for the computerized versions. Furthermore, caution is advised when using the 64-card version because of a lower sensitivity to more subtle cognitive problems.
The test-retest reliability scores in healthy adults are disappointingly low for the main variables in the WCST. The difficulty is also that there are different studies with most of the time only small test-retest groups (< 100) and using different WCST variables. From what I’ve read so far test-retest reliability is usually less than .63. Furthermore, there are doubts about the second administration of the test because whenever people know the strategies and 3 rules, the WCST does not seem to measure problem-solving anymore.
All in all there are a few advantages in using the WCST, especially the computer version. It definitively is a sensitive test for executive function, especially for assessing the ability for self-correction. I know of no other test that measures this ability so clearly. Especially when the number of perseverative errors is 11 or more. However, there are also quite a few disadvantages of the WCST that make the test results really difficult to interpret correctly. First of all, although the Number of Perseverative Errors is the most viable and logical variable to use in interpreting the test, there are at least several explanations possible accounting for a high number of PEs. So it is not really certain what exactly is the patient’s problem. It can be a failure to focus attention or maintain a set, but it can also indeed be a serious problem in flexible thinking (shifting set). Therefore, it is strongly recommended not to use the WCST alone as an indication for executive functions. Several executive function tests have to be administered as well to try to find a specific pattern of functioning. Another disadvantage of the paper and pencil version of the WCST is the difficult scoring system, although Lezak (2004) suggests some helpful tricks (p. 588). My advice would be to use a computerized version, however, take caution in interpreting the results because norms are based on the paper and pencil versions. Thirdly, the long 128-card version is usually too long for clinical patients and highly stressful. When constantly confronted with negative feedback, a lot of patients do not want to continue the test. Therefore I only use the 64-card version on the computer. Fourth, it is not really sure how the Heaton (american) norms can be compared to European populations. My guess is that there should be not much difference in caucasian American and European people, but nevertheless caution in interpreting norms is advised. Fifth, the WCST has low test-retest reliabilities in a normal population. Not surprisingly, because most healthy people do perform at a ceiling effect level. The reliabilities in clinical populations is more reassuring so retesting a patient with sufficient time in-between administrations (say at least 3 months) should be possible. But whenever the WCST is done almost perfectly, there is no reason at all to do a retest. That makes the test quite useless in research designs.
The WCST is available at all major test publishers. Links are provided below:
Psychological Assessment Resources:
Psychological Assessment Resources (PAR, Inc.): for the Modified Wisconsin Card Sorting Test using only 48 cards. 10 minutes to administer, 3 minutes to score. N=327 healthy people, age range 18-90 yrs (so not children).
With the same company you can order the original WCST but computerized and 64-cards version. It is easy to administer (10-15 min) and to score (10 min). The software is for unlimited use and costs 575,00 $. Highly recommended is the Card version Professional Manual for an additional 114,00 $.
For anti-computer clinicians you can order the 64-card paper and pencil version as well for just 288,00 $ excluding the Card version Professional Manual.
Western Psychological Services (WPS) is a bit cheaper:
Western Psychological Services
For the 64-card version of the WCST you pay here: 295,00 $, paper and pencil version. They have different norms (18 to 89 yrs) and (6 to 17 yrs), probably Heaton’s norms.
The original Heaton 128-card version, paper and pencil version can be purchased for 389,00 $. Heaton norms (1993).
For some very interesting articles about the WCST see the following links:
Below you will find the Test-Evaluation Table for the WCST. It was very difficult to compose because most research had been done with the paper and pencil 128-card version of the WCST. Furthermore, there are many different studies with very different brain injury type patients, in different languages as well.
Summary: what is the best executive function test?
I really don't know. I have only reviewed 3 tests, commonly seen as executive function tests. However, their total points are disappointing, ranging from 23 to 19. Although I do feel that the TOL is a rather good executive function test, I feel this even more strongly for the Picture Arrangement. Although this test is used for executive function assessment, it is not used much. Strauss et al. (2006) recommend using more measures and also use questionnaires to assess executive functions. Interviews with caregivers and/or partners is highly recommended as well.
Go from Executive Function Tests to Test-Psychology
Go from Executive Function Test to Home page