Identify each aspect of a test as undermining either its reliability or validity.

This is a preview. Log in to get access

Abstract

Despite the fact that validating the measures of constructs is critical to building cumulative knowledge in MIS and the behavioral sciences, the process of scale development and validation continues to be a challenging activity. Undoubtedly, part of the problem is that many of the scale development procedures advocated in the literature are limited by the fact that they (1) fail to adequately discuss how to develop appropriate conceptual definitions of the focal construct, (2) often fail to properly specify the measurement model that relates the latent construct to its indicators, and (3) underutilize techniques that provide evidence that the set of items used to represent the focal construct actually measures what it purports to measure. Therefore, the purpose of the present paper is to integrate new and existing techniques into a comprehensive set of recommendations that can be used to give researchers in MIS and the behavioral sciences a framework for developing valid measures. First, we briefly elaborate upon some of the limitations of current scale development practices. Following this, we discuss each of the steps in the scale development process while paying particular attention to the differences that are required when one is attempting to develop scales for constructs with formative indicators as opposed to constructs with reflective indicators. Finally, we discuss several things that should be done after the initial development of a scale to examine its generalizability and to enhance its usefulness.

Journal Information

The editorial objective of the MIS Quarterly is the enhancement and communication of knowledge concerning the development of IT-based services, the management of IT resources, and the use, impact, and economics of IT with managerial, organizational, and societal implications. Professional issues affecting the IS field as a whole are also in the purview of the journal.

Publisher Information

Established in 1968, the University of Minnesota Management Information Systems Research Center promotes research in MIS topics by bridging the gap between the corporate and academic MIS worlds through the events in the MISRC Associates Program.

This is a preview. Log in to get access

Abstract

To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based interpretations and uses. Validation then can be thought of as an evaluation of the coherence and completeness of this interpretation/use argument and of the plausibility of its inferences and assumptions. In outlining the argument-based approach to validation, this paper makes eight general points. First, it is the proposed score interpretations and uses that are validated and not the test or the test scores. Second, the validity of a proposed interpretation or use depends on how well the evidence supports the claims being made. Third, more-ambitious claims require more support than less-ambitious claims. Fourth, more-ambitious claims (e.g., construct interpretations) tend to be more useful than less-ambitious claims, but they are also harder to validate. Fifth, interpretations and uses can change over time in response to new needs and new understandings leading to changes in the evidence needed for validation. Sixth, the evaluation of score uses requires an evaluation of the consequences of the proposed uses; negative consequences can render a score use unacceptable. Seventh, the rejection of a score use does not necessarily invalidate a prior, underlying score interpretation. Eighth, the validation of the score interpretation on which a score use is based does not validate the score use.

Journal Information

The Journal of Educational Measurement (JEM) is a quarterly journal that publishes original measurement research, reports on new measurement instruments, reviews of measurement publications, and reports about innovative measurement applications. The topics addressed are of interest to those concerned with the practice of measurement in field settings as well as researchers and measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.

Publisher Information

The National Council on Measurement in Education (NCME) is a professional organization for individuals involved in assessment, evaluation, testing, and other aspects of educational measurement. Members are involved in the construction and use of standardized tests and performance-based assessment, assessment program design and implementation, and program evaluation. NCME is incorporated exclusively for scientific, educational, literary, and charitable purposes. These include: (1) the encouragement of scholarly efforts to advance the science of measurement and its applications in education and (2) the dissemination of knowledge about the theory, techniques, and instrumentation available for measurement; procedures appropriate to the interpretation and use of such techniques and instruments; and applications of educational measurement in individual and group contexts. NCME members include university faculty; test developers; state and federal testing and research directors; professional evaluators; testing specialists in business, industry, education, community programs, and other professions; licensure, certification, and credentialing professionals; graduate students from educational, psychological, and other measurement programs; and others involved in testing issues and practices.

Rights & Usage

This item is part of a JSTOR Collection.
For terms and use, please refer to our Terms and Conditions
Journal of Educational Measurement © 2013 National Council on Measurement in Education
Request Permissions

How will you determine that test items is reliable and valid?

For an exam or an assessment to be considered reliable, it must exhibit consistent results. Deviations from data patterns and anomalous results or responses could be a sign that specific items on the exam are misleading or unreliable.

What is reliability and validity with examples?

For a test to be reliable, it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight.

What is the difference between validity and reliability in assessment?

The reliability of an assessment tool is the extent to which it consistently and accurately measures learning. The validity of an assessment tool is the extent by which it measures what it was designed to measure.

What is the difference between reliability and validity?

Reliability (or consistency) refers to the stability of a measurement scale, i.e. how far it will give the same results on separate occasions, and it can be assessed in different ways; stability, internal consistency and equiva- lence. Validity is the degree to which a scale measures what it is intended to measure.