781.708.4445

info@levinedisputeresolution.com

Protecting the Integrity of Forensic Psychological Testing: A Reply to Geffner, et al. (2009)

DAVID MEDOFF, Ph.D.
Suffolk University and Harvard Medical School, Boston, Massachusetts

Recommendations regarding psychological test use made by Geffner, Conradi, Geis, and Aranda (2009) directly contradict well established standards of practice, guidelines promulgated by several professional organizations, and well-known legal standards regarding the admissibility of evidence in Court. This article refutes these recommendations and argues that test selection in the forensic context requires adherence to the principles enumerated in established case law as well as relevant ethical principles and standards of practice reflected in practice guidelines. Utilizing subjectively based and scientifically unreliable psychological assessment tools such as those recommended by Geffner and his colleagues introduces unknowable degrees of error, jeopardizes the integrity of evaluative methodologies in use, and seriously compromises the data that comprises the foundation upon which psychological opinions are rendered and legal decisions are made. In making their recommendations, these authors ignore an extensive and important body of empirical literature, professional practice guidelines, and legal rules requiring the use of scientifically supported, validated, and reliable test instruments for forensic use.

________________________

Address correspondence to David Medoff, Ph.D., Suffolk University, College of Arts and Sciences, 73 Tremont Street, Boston, MA 02108. E-mail: DMedoff@suffolk.edu

INTRODUCTION

Geffner, Conradi, Geis, and Aranda (2009) have called attention to the important issue of assessing family violence in the context of child custody evaluations. In so doing, however, they violate fundamental and established tenets of forensic psychological assessment by recommending the forensic use of several subjectively based, invalid, and scientifically unreliable psychological techniques including the Thematic Apperception Test (TAT), the Roberts Apperception Test for Children, the Sentence Completion Test (SCT), the Draw-A-Person technique (DAP), and Kinetic Family Drawings (e.g., Ackerman, 1999; American Psychological Association, 2009; Bow, Gould, Flens & Greenhut, 2006; Gould, 2005; Gregory, 2007; Heilbrun, 1992; Khan & Taft, 1983; Medoff, 2003; 2009; Melton, Petrila, Poythress, & Slobogin, 2007; Otto & Heilbrun, 2002; Weithorn, 2006; Wrightsman, 2005).

Forensic psychological testing practicesthat require reliance on scientifically based and empirically soundmethodologies are not simplymatters of tradition or convention, but reflect required compliance with relevant legal tests of scientific reliability and evidentiary rules, adherence to professional ethical codes, consideration of professional practice guidelines, and observance ofwell established standards of practice (e.g., American Psychological Association, 2002; 2009; Association of Family and Conciliation Courts, 2006; Bow et al., 2006; Committee on the Revision of the Specialty Guidelines for Forensic Psychology, 2006; Flens, 2005; Gregory, 2007; Heilbrun, 2001; Medoff, 1999; 2003; 2009; Otto, Edens, & Barcus, 2000). Further, the emphasis on psychometrically sound testing methods stems from the unjustified imprimatur of scientific rigor and authority attached to some invalid testing ''techniques'' as compared to other less empirical psychological assessment procedures including clinical interviews, mental status examinations, psychosocial histories, and parent-child observations.

Ironically, in their article, Geffner et al. (2009) cite several of the references mentioned previously in addition to other sources (Flens & Drozd, 2005; Gould, 2005), virtually all of which clearly support the fundamental principles of objectivity and demonstrated validity in psychological testing, yet they then make recommendations that directly contradict them. Further, they recommend the use of these subjectively interpreted and scientifically unreliable measures despite their recognition that validation research of these instruments has been characterized as ''not strong'' in the empirical literature (Geffner et al., p. 211).

FORENSIC PSYCHOLOGICAL TESTING AS A SPECIALTY

Forensic psychology has been considered a specialized area of practice formany years. In fact, the American Board of Professional Psychology (ABPP) recognized it as a specialty in 1985, and the American Psychological Association The Integrity of Forensic Psychological Testing 79 Downloaded By: [Cox, Hannah][informa internal users] At: 13:47 23 February 2010 (APA) followed with its official recognition in 2001 (Packer, 2008). The specialty status of forensic psychology is further evidenced by the promulgation of the Specialty Guidelines for Forensic Psychology, first published in 1991 by a joint committee of the American Psychology-Law Society of the APA and the American Academy of Forensic Psychology (Committee on Ethical Guidelines for Forensic Psychologists, 1991). These guidelines have recently been updated and are currently in the process of being ratified by the APA (Committee on the Revision of the Specialty Guidelines for Forensic Psychology, 2006).

The practice of forensic psychology in general, and forensic psychological testing specifically, clearly requires a higher standard of practice than that necessary for clinical work (APA, 1994; 2009; Association of Family and Conciliation Courts, 2006; Bow et al., 2006; Committee on the Revision of the Specialty Guidelines for Forensic Psychology, 2006; Flens & Drozd, 2005; Hamilton, 1998; Heilbrun, 1992; 2001; Koocher, 2006; Medoff, 2003; Otto & Heilbrun, 2002; Perrin & Sales, 1994). These higher standards are not only demonstrated by the designation of forensic practice as a specialty, but they also arise from the United States Supreme Court decisions of Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993), General Electric Co. v. Joiner (1997), and Kumho Tire Co. v. Carmichael (1999).

Considered together, these legal cases guide Courts by providing specific factors to be considered and weighed when determining the admissibility of expert testimony based upon scientific evidence and information stemming from other technical or specialized knowledge. The factors outlined in the Daubert decision focus on the scientific reliability of the methods relied upon in generating data for rendering expert testimony, and the Kumho decision expands the applicability of these factors to testimony based on non-scientific methods. Reliance on these factors is intended to yield greater scientific reliability of expert testimony. The factors include the (a) testability or falsifiability of the theories involved, (b) peer review and publication of the techniques in use, (c) demonstrated evidence of known or potential error rates and standards of administration for the data-generating procedures, and (d) general acceptance of the methodology relied upon.

This last factor is based on the well-known Frye standard (Frye, 1923) that states a methodology ''must be sufficiently established to have gained general acceptance in the particular field in which it belongs'' [italics added]. In the context of psychological testing as part of a child custody evaluation, the particular field in question would be forensic psychological testing and not testing performed for clinical purposes.

DISTINGUISHING FORENSIC AND CLINICAL CONTEXTS

In their article, Geffner et al. (2009) appear to justify their recommendations for the use of subjective and unvalidated measures by stating, ''projective 80 D. Medoff Downloaded By: [Cox, Hannah][informa internal users] At: 13:47 23 February 2010 measures have been a standard part of psychological evaluations for decades to assess themes and conflicts, as long as a particular task is not interpreted in isolation or used solely to diagnose a condition or disorder'' (p. 211). While this statement seems to be a vague reference to the Frye principle of general acceptance, it fails to recognize the specialty status of forensic assessment and its corresponding requirement for higher standards of practice. It also fails to more generally distinguish forensic work from clinical work, a principle that is well established (e.g., Committee on the Revision of the Specialty Guidelines for Forensic Psychology, 2006; Martindale & Flens, 2009; Medoff, 2003; 2009).

The Supreme Court in the Daubert (1993) decision eloquently addresses a directly relevant and parallel issue when it compares the process of scientific analysis to decision-making in legal cases:

It is true that open debate is an essential part of both legal and scientific analyses. Yet there are important differences between the quest for truth in the courtroom and the quest for truth in the laboratory. Scientific conclusions are subject to perpetual revision. Law, on the other hand, must resolve disputes finally and quickly. The scientific project is advanced by broad and wide-ranging consideration of a multitude of hypotheses, for those that are incorrect will eventually be shown to be so, and that in itself is an advance. Conjectures that are probably wrong are of little use, however, in the project of reaching a quick, final, and binding legal judgment—often of great consequence—about a particular set of events in the past (p. 597).

The parallels between scientific investigation and its clinical application in developing mental health treatments are easily identifiable, as the nature of clinical work mimics the scientific analysis described previously. That is, the delivery of mental health services is an ongoing enterprise subject to ''perpetual revision'' that routinely involves re-assessment at varying intervals, utilizes diagnoses as ''working hypotheses'' that are often changed on the basis of new information, frequently includes alternate medication trials pending therapeutic outcome, entails altering treatment goals over time, and often requires revised treatment plans. As a result, there is more tolerance for ambiguity or even error under these conditions since decisions can be revisited on the basis of new information or new conceptualizations.

Contrast this enterprise with the need for relatively ''quick'' and ultimately ''final'' outcomes of legal proceedings. After a trial has ended or an appellate decision has been rendered, forensic practitioners involved in a case simply cannot revise their opinions, conduct additional assessments using more scientifically defensible procedures, or clarify or reject opinions or conjectures that turned out to be ''wrong'' and thus ''of little use.'' As a consequence, there is a premium placed upon the scientific support for psychological testing methods relied upon to generate foundations of expert The Integrity of Forensic Psychological Testing 81 Downloaded By: [Cox, Hannah][informa internal users] At: 13:47 23 February 2010 testimony. Courts may not prefer uncertainty or ambiguity in expert testimony, but they are justifiably leery of expert opinion based on subjective testing methods of questionable scientific reliability. There is, therefore, a greater need for accuracy in the first instance when conducting forensic evaluations, and the threshold for error in this context is much more limited.

Further, although the Frye ''general acceptance'' principle is one of several factors to be considered under the Daubert guidelines, it is clearly not dispositive following the Daubert decision, nor is ''general acceptance'' alone an adequate indication of scientific validity or reliability (Flens, 2005; Gould, 2005; Martindale, 2001; Medoff, 2003). In addition to failing to recognize the significance of forensic assessment as a specialty area and associated guidelines for professional practice in the forensic context, the failure by Geffner et al. (2009) to recognize this critical aspect of the Daubert case results in their misguided recommendations for the use of psychological tests or techniques that fail to meet the Daubert criteria. The Thematic Apperception Test (TAT) serves as a good example of the issues involved in the brief discussion that follows.

THE VALIDITY AND RELIABILITY OF SUBJECTIVE TESTS

The Thematic Apperception Test is a subjective technique that stems directly from psychoanalytic theory and was developed by Henry A. Murray (1943) in the first half of the 20th century to assess various unconscious motives, repressed drives, and psychological conflicts. Murray himself states in the TAT manual, '' . . . seeing that the TAT responses reflect the fleeting mood as well as the present situation of the subject, we should not expect the repeat reliability of the test to be high'' (p. 18). Nonetheless, many clinicians attribute more stable trait-like phenomenon to the information gleaned from this technique.

In addition, the TAT lacks a standard method for administration by virtue of the unsystematic selection of stimulus cards involved, it lacks an adequately systematized or objective scoring system, and it lacks any empirically-based interpretive scheme for general personality description (Bow et al., 2006; Khan & Taft, 1983; Martindale & Flens, 2009; Medoff, 2003; 2009). In fact, even though Dana (2008) has recently attempted to revive a scoring system he developed in the 1950s, he himself has characterized the TAT as lacking consensual scoring and vulnerable to ''faulty interpretation via projection of clinician-related contents'' (Dana, 1996, p. 202). Others have reported that clinicians using the TAT are either unlikely to use a scoring system at all or to use an idiosyncratic amalgamation of systems. As a result, the TAT as commonly used is based on techniques of unknown validity and untested reliability (e.g., Khan & Taft, 1983; Ryan, 1985).

While some practitioners might claim the existence of a validated and ''objective'' scoring system for the TAT when used for general personality description, one of the most recently published scoring systems of this kind is described in two non-peer reviewed book chapters and is based on research from the 1950s (Dana, 2008; Santa Maria, 2008). The most recent peer reviewed research cited in the manual for this system is from 1957, the scoring system was ''validated'' on only five TAT cards (thus requiring deviation from the standard administration instructions from the original TAT manual), the scoring criteria for this system are poorly operationalized which results in increased subjective judgment on the part of the psychologist interpreting the measure, and the validation research for this system is intended to discriminate test subjects into three stratified groups characterized as ''normal,'' ''neurotic,'' and ''psychotic'' (Santa Maria, 2008). In addition, the comparison sample used in this system was collected and published in 1957 and was classified by diagnostic schemata from the 1950s. Further, the system is reportedly intended for clinical use, not forensic assessment (Santa Maria, 2008), and interpretation for the system incorporates content analysis (Dana, 2008) that is both subjective and idiosyncratic to those psychologists using the measure.

Taken together, the development of the Dana system, the outdated peer review research supporting it, the limited relevance of its interpretative categories, and the absence of sound empirical support for its use renders this scoring system unsuitable for forensic use. Similar criticisms can be made of the Roberts Apperception Test for Children with a brief review of the testing manual (McArthur & Roberts, 1982; Roberts, 2006). It is important to recognize that although some researchers have developed TAT scoring systems that may be somewhat more standardized and objective, these systems are not intended for general personality description and focus on more limited constructs such as defense mechanisms and social cognition (Cramer, 1991; Westen, 1991). Further, although these systems for scoring and interpreting the TAT are clearly more current and perhaps more relevant than the Dana system, the degree to which any of these systems are utilized by practicing psychologists is questionable. In addition, the TAT, as well as the Roberts Apperception Test for Children, the Sentence Completion Test, the Draw-A-Person technique, and Kinetic Family Drawings have been subject to major criticism and, either directly or indirectly, deemed unacceptable for forensic use (Bow et al., 2006; Flens & Drozd, 2005; Gould, 2005; Heilbrun, 1992; Khan & Taft, 1983; Lally, 2001; Martindale & Flens, 2009; Medoff, 2003; 2009; Otto et al., 2000).

PEER REVIEW LITERATURE, ETHICS, AND STANDARDS

The testing recommendations of Geffner et al. (2009) also contradict published models for forensic test selection that outline criteria for utilizing The Integrity of Forensic Psychological Testing 83 Downloaded By: [Cox, Hannah][informa internal users] At: 13:47 23 February 2010 forensically acceptable testing measures (Heilbrun, 1992; Marlowe, 1995; Otto et al., 2000). It should be noted that any particular element of these recommendations is seen as necessary but not sufficient to support the use of a test in forensic assessment.

These models for test selection overlap to a considerable extent and together provide recommendations that tests acceptable in a forensic assessment context should (a) be commercially available have documentation regarding its psychometric properties and standard methods of operation, (b) be the subject of ongoing research with demonstrated adequate levels of validity and reliability, (c) be relevant to the psychological construct underlying the specific legal issue at hand, (d) have standard administration and interpretation procedures, (e) be the subject of peer reviewed publication, and (f) include measures of response style.

Further, a recent survey by Bow et al. (2006) found that of those psychologists who routinely rely on psychological testing while performing child custody evaluations, the overwhelming majority of them reported that the TAT, the Roberts Apperception Test for Children, the Sentence Completion Test, the Human Figure Drawing technique and the Kinetic Family Drawing task fails to meet Daubert criteria. In several instances these findings approached or exceeded 80% of those psychologists surveyed, although, obviously, decisions regarding admissibility remain in the purview of responsibility for the finder of fact. Of interest are results of this survey indicating high usage rates for some of these measures despite their noted limitations.

The use of subjective measures that lack empirical support as recommended by Geffner and his colleagues (2009) also contradicts numerous elements of professional ethics, guidelines, and standards. For example, the APA Ethical Principles of Psychologists and Code of Conduct Standard 9.01(a) states, ''psychologists base opinions contained in their recommendations . . . , including forensic testimony, on information and techniques sufficient to substantiate their findings'' [italics added] (APA, 2002, p. 12.). This raises questions as to how formal assessment tools of unknown scientific validity and reliability could be sufficient or even helpful in substantiating findings or expert opinions.

The APA Ethics Code also clearly states in Standard 9.02(a) that psychologists should rely on tests and assessment techniques ''in a manner and for purposes that are appropriate in light of the research on or evidence of the usefulness (i.e., validity and reliability) and proper application of the techniques'' [italics added] (APA, 2002, p. 12). Also of interest are the Guidelines for Child Custody Evaluations in Family Law Proceedings published by the APA which state, ''Psychologists strive to employ optimally diverse and accurate methods for addressing the questions raised in a specific child custody evaluation'' (APA, 2009, p. 5).

The Specialty Guidelines for Forensic Psychology provide additional guidance for proper forensic assessment. This can be seen in Standard 84 D. Medoff Downloaded By: [Cox, Hannah][informa internal users] At: 13:47 23 February 2010 11.01 which states, ''Forensic practitioners practice in a competent manner, consistent with accepted forensic and scientific standards. They utilize forensically appropriate data collection methods . . . '' (Committee on the Revision of the Specialty Guidelines for Forensic Psychology, 2006, p. 16). These guidelines specifically address issues related to the psychometric properties of forensic measures by indicating in Standard 12.02 that forensic practitioners ''use assessment instruments whose validity and reliability have been established for use with members of the population tested or other representative populations'' (p. 18).

Standards of practice that have been vetted and published by recognized and respected professional organizations serve as yet another source of authority regarding the procedures and techniques to be relied upon in any given field. The Model Standards of Practice for Child Custody Evaluations promulgated by the Association of Family and Conciliation Courts (AFCC) in 2006 serve as an example of this and emphasize the necessity for using test measures with demonstrated accuracy and consistency.

This is made quite clear in Standard 5.6, which states, ''because evaluators are expected to assist triers of fact, evaluators have a special responsibility to base their selection of assessment instruments and their choice of data gathering techniques on the reliability and validity of those instruments and techniques'' (AFCC, 2006, p. 15). This important point is emphasized even further in Standard 5.6 which indicates that, ''evaluators shall strive to use methods and procedures of data collection that are empirically-based'' (AFCC, p. 15).

These model standards also recognize the essential legal implications for test selection and the elements of the Daubert decision in Standard 6.3, which states, ''Evaluators shall be aware of the criteria concerning admissibility and weight of evidence employed by courts in their jurisdictions'' (AFCC, 2006, p. 18). This Standard directly addresses the fundamental differences between clinical and forensic practice as well in stating: ''Some assessment instruments, data-gathering techniques, and tests that are acceptable in health care settings may not meet the evidentiary demands associated with forensic work. In selecting methods and procedures, evaluators shall be aware of the criteria employed by courts in their jurisdictions in rendering decisions concerning admissibility and weight'' (AFCC, p. 18).

This point is elaborated upon further in Standard 6.3 where it states ''evaluators shall be mindful of issues pertaining to the applicability of psychometric test data to the matters before the court . . . , and shall carefully examine the available written documentation on the reliability and validity of assessment instruments, data gathering techniques, and tests under consideration for use in an evaluation'' (AFCC, 2006, p. 18). Together, these model standards convey, in unequivocal terms, the crucial importance of selecting test instruments that have adequately established levels of validity, reliability, standards of operation, and empirical support.

THE USE OF CHECKLISTS AND INVENTORIES

Geffner and his colleagues (2009) also make recommendations regarding the forensic use of face valid symptom checklists and inventories such as the Trauma Symptom Inventory (TSI), the Child Behavior Checklist (CBCL), the Trauma Symptom Checklist for Children (TSCC), the Children's Depression Inventory (CDI), the Trauma Symptom Checklist for Young Children (TSCYC), the Child Sexual Behavior Inventory (CSBI), and the Detailed Assessment of Posttraumatic Stress (DAPS). Although these instruments may be objective in the purest sense of their test construction and scoring procedures, the term ''face valid'' means that the items of these measures are transparent and obvious, thus rendering them susceptible to intentional manipulation by the test subject (Achenbach & Rescorla, 2001; Briere, 1995; 1996; 2001; 2005; Friedrich, 1997; Kovacs, 1992; Medoff, 2003).

In addition, although several of these assessment tools do include response style indicators, they are not only face valid but are extremely leading from a forensic standpoint. That is, they provide the test subject with a detailed and comprehensive written list of symptoms to be reviewed and then either endorsed or denied. While this may not be concerning if these measures are used for clinical purposes, substantial concern is raised if they are used for forensic assessment. These concerns stem from the differences in the incentives for influencing test results in clinical versus forensic contexts, and data from these types of measures must, therefore, be interpreted within the context of the reason for referral with particular regard to potential motivation to exaggerate (''fake bad'') or minimize (''fake good'') possible symptoms.

"PROJECTION" AND OUTDATED TERMINOLOGY

The recommendations made by Geffner et al. (2009) are of concern in another way related to their categorization of the Rorschach Inkblot Method as a ''projective'' test. The psychoanalytic theoretical construct of ''projection'' was first applied to psychological testing by Frank (1939) in what has become known as the ''projective hypothesis.'' In essence, the projective hypothesis states that when a test subject is faced with ambiguous stimuli, they will respond in terms of their own internal needs, conflicts, motives, and unconscious wishes (e.g., Frank, 1939; Gregory, 2007). The concept of projection is, therefore by definition, based on a subjective process that not only requires idiosyncratic conjecture, but fosters the use of fantasy and imagination. Although the foundation for many projective techniques is based on the projective hypothesis, empirical support for this theory is lacking in the scientific literature (Martindale & Flens, 2009; Medoff, 2009; Rubin, 1981).

The TAT in its most elemental sense is the epitome of a projective measure, as it is based on psychoanalytic theory and invites the test subject into the world of imagination (Murray, 1943). The Rorschach, however, when used with the Comprehensive Scoring System, is a fundamentally different measure that is based on the theory of empiricism and requires the test subject to classify their visual perception of presented stimuli (e.g., Exner, 1993; 2003; Exner & Erdberg, 2005; Medoff, 2003; Weiner, 2003). In fact, early research on the Comprehensive Scoring System for the Rorschach has described an elaborate response process involving six elements that can be summarized as (a) encoding the stimulus field, (b) classifying identified visual percepts, (c) discarding potential responses, and (d) filtering remaining potential responses while reporting the visual image (see Exner, 2003). Further, when required to classify the stimulus into a visual image, test subjects are forced into a cognitive problem solving task that ''provokes a complex of psychological operations'' (Exner, 2003, p. 163). The ambiguity of the Rorschach stimulus cards is also less salient than traditionally believed by virtue of empirically established and reference-based ''popular'' responses defined as occurring at least once in every three test protocols.

As the science of psychological testing has advanced, practitioners have become more sophisticated in their thinking and conceptualization of the measures used in personality assessment. This has led to increased and wider recognition of the Rorschach not as a ''projective'' test but as a scientifically based cognitive-perceptual problem solving task (Calloway, 2005; Erard, 2007; Exner, 1989; 2003; Exner & Erdberg, 2005; Martindale & Flens, 2009; Medoff, 2003; 2009; Meyer & Kurtz, 2006). This is not to say that projection does not occur on occasion as part of the response process to the Rorschach, but description of the Rorschach as a ''projective'' measure substantially mischaracterizes the nature of the instrument, its current use, and the empirically-based interpretation of the data derived from it. This only ''promotes misleading connotations that will not serve the field well as we seek to have a more differentiated understanding of assessment methods'' (Meyer & Kurtz, 2006, p. 224). In fact, there have been recent calls in the literature to reconsider the characterization of the Rorschach in this manner (see Bornstein, 2007; Meyer & Kurtz, 2006).

SUMMARY AND CONCLUSIONS

The empirical literature, codes of professional ethics, and published standards of practice clearly demonstrate that forensic psychological testing requires reliance upon those test instruments that are empirically supported and scientifically-based through research and standardization. Professional guidelines and widely accepted standards of professional practice require that well established and clearly defined systematized procedures for The Integrity of Forensic Psychological Testing 87 Downloaded By: [Cox, Hannah][informa internal users] At: 13:47 23 February 2010 administration, scoring, and interpretation of forensically acceptable measures should be strictly followed. This is especially true in the performance of forensic psychological testing, as standards of practice for forensic psychology are higher than those for psychological testing conducted for clinical purposes.

In their recommendations for the use of the TAT, the Roberts Apperception Test for Children, the Sentence Completion Test, the Draw-A-Person technique, and Kinetic Family Drawings, Geffner and his colleagues (2009) advocate for the use of techniques known to fall short of professional standards for scientific validation and reliability and, therefore, short of meeting legal criteria for admissibility as a basis for expert opinion. Use of these measures is forensically unacceptable, violates standards of practice for forensic psychological testing, and jeopardizes the integrity of the evaluative methodologies in which they are used. In making their recommendations, Geffner and his colleagues ignore an extensive and important body of empirical literature, practice standards, and ethical guidelines requiring the use of scientifically supported, validated, and reliable test instruments for forensic purposes. Further, continued reliance on assessment measures that lack adequate indicia of scientifically robust validity and reliability risks perception by the courts and the public of psychologists as practitioners of idiosyncratic methods against which the courts must now guard in the post-Daubert era.

Reliance on psychological tests or measures that lack standards of operation, acceptable comparison groups, known error rates, and demonstrable levels of validity or reliability necessarily raises profound questions as to the integrity and validity of test results. This is because such measures produce ''findings'' that are subjective and speculative. As a result, the invalid and unreliable data fromthese measures are fraught with unknowable levels of error for which there is no accountability. Subsequently, impressions and conclusions based on such tainted data are also called into question, and this seriously compromises the foundation of data upon which legal decisions are made.

The use of subjective instruments and the absence or violations of standardized procedures also raises serious questions regarding the overall methodologies in use and the decisions made by professionals relying on such measures. Because results from various empirically sound psychological tests utilize comparisons between an individual test subject and data from normative or reference groups gathered under standardized test conditions, violations of standardized administration procedures can negate any such comparisons and, thus, render data from these tests questionable. In addition, the use of these methods seriously calls into question factors related to admissibility under the United States Supreme Court decisions of Daubert (1993), Joiner (1997), and Kumho (1999), regardless of the status of these cases in any particular state. Despite the recommendations made by Geffner et al. (2009), psychologists would be well-advised to reject use of forensically indefensible assessment tools when in a forensic context.

REFERENCES