What is the validity of a test measured by?

Discover 21 more articles on this topic

Test validity incorporates a number of different validity types, including criterion validity, content validity and construct validity. If a research project scores highly in these areas, then the overall test validity is high.

Criterion Validity

Criterion validity establishes whether the test matches a certain set of abilities.

  • Concurrent validity measures the test against a benchmark test, and high correlation indicates that the test has strong criterion validity.
  • Predictive validity is a measure of how well a test predicts abilities, such as measuring whether a good grade point average at high school leads to good results at university.

Content Validity

Content validity establishes how well a test compares to the real world. For example, a school test of ability should reflect what is actually taught in the classroom.

Construct Validity

Construct validity is a measure of how well a test measures up to its claims. A test designed to measure depression must only measure that particular construct, not closely related ideals such as anxiety or stress.

What is the validity of a test measured by?

Tradition and Test Validity

This tripartite approach has been the standard for many years, but modern critics are starting to question whether this approach is accurate.

In many cases, researchers do not subdivide test validity, and see it as a single construct that requires an accumulation of evidence to support it.

Messick, in 1975, proposed that proving the validity of a test is futile, especially when it is impossible to prove that a test measures a specific construct. Constructs are so abstract that they are impossible to define, and so proving test validity by the traditional means is ultimately flawed.

Messick believed that a researcher should gather enough evidence to defend his work, and proposed six aspects that would permit this. He argued that this evidence could not justify the validity of a test, but only the validity of the test in a specific situation. He stated that this defense of a test's validity should be an ongoing process, and that any test needed to be constantly probed and questioned.

Finally, he was the first psychometrical researcher to propose that social and ethical implications of a test were an inherent part of the process, a huge paradigm shift from the accepted practices. Considering that educational tests can have a long-lasting effect on an individual, then this is a very important implication, whatever your view on the competing theories behind test validity.

This new approach does have some basis; for many years, IQ tests were regarded as practically infallible.

However, they have been used in situations vastly different from the original intention, and they are not a great indicator of intelligence, only of problem solving ability and logic.

Messick's methods certainly appear to predict these problems more satisfactorily than the traditional approach.

What is the validity of a test measured by?

Which Measure of Test Validity Should I Use?

Academics are generally very resistant to change, and a huge number of educationalists and social scientists stick with the traditional methods.

Both methods have their own strengths and weaknesses, so it comes down to personal choice and what your supervisor prefers. As long as you have a strong and well-planned test design, then the test validity will follow.

Works Cited

Wainer, H. Braun, H.I. (1988) Test Validity. New Jersey: Lawrence Erlbaum Associates.

Published on September 6, 2019 by Fiona Middleton. Revised on October 10, 2022.

In quantitative research, you have to consider the reliability and validity of your methods and measurements.

Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid. There are four main types of validity:

  • Construct validity: Does the test measure the concept that it’s intended to measure?
  • Content validity: Is the test fully representative of what it aims to measure?
  • Face validity: Does the content of the test appear to be suitable to its aims?
  • Criterion validity: Do the results accurately measure the concrete outcome they are designed to measure?

Note that this article deals with types of test validity, which determine the accuracy of the actual components of a measure. If you are doing experimental research, you also need to consider internal and external validity, which deal with the experimental design and the generalizability of results.

Construct validity

Construct validity evaluates whether a measurement tool really represents the thing we are interested in measuring. It’s central to establishing the overall validity of a method.

What is a construct?

A construct refers to a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it.

Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organizations or social groups, such as gender equality, corporate social responsibility, or freedom of speech.

Example

There is no objective, observable entity called “depression” that we can measure directly. But based on existing psychological research and theory, we can measure depression based on a collection of symptoms and indicators, such as low self-confidence and low energy levels.

What is construct validity?

Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct?

To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression.

The other types of validity described below can all be considered as forms of evidence for construct validity.

Content validity

Content validity assesses whether a test is representative of all aspects of the construct.

To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure. If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened.

Example

A mathematics teacher develops an end-of-semester algebra test for her class. The test should cover every form of algebra that was taught in the class. If some types of algebra are left out, then the results may not be an accurate indication of students’ understanding of the subject. Similarly, if she includes questions that are not related to algebra, the results are no longer a valid measure of algebra knowledge.

Face validity

Face validity considers how suitable the content of a test seems to be on the surface. It’s similar to content validity, but face validity is a more informal and subjective assessment.

Example

You create a survey to measure the regularity of people’s dietary habits. You review the survey items, which ask questions about every meal of the day and snacks eaten in between for every day of the week. On its surface, the survey seems like a good representation of what you want to test, so you consider it to have high face validity.

As face validity is a subjective measure, it’s often considered the weakest form of validity. However, it can be useful in the initial stages of developing a method.

Criterion validity

Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test.

What is a criterion variable?

A criterion variable is an established and effective measurement that is widely considered valid, sometimes referred to as a “gold standard” measurement. Criterion variables can be very difficult to find.

What is criterion validity?

To evaluate criterion validity, you calculate the correlation between the results of your measurement and the results of the criterion measurement. If there is a high correlation, this gives a good indication that your test is measuring what it intends to measure.

Example

A university professor creates a new test to measure applicants’ English writing ability. To assess how well the test really does measure students’ writing ability, she finds an existing test that is considered a valid measurement of English writing ability, and compares the results when the same group of students take both tests. If the outcomes are very similar, the new test has high criterion validity.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Middleton, F. (2022, October 10). The 4 Types of Validity in Research | Definitions & Examples. Scribbr. Retrieved November 3, 2022, from https://www.scribbr.com/methodology/types-of-validity/

Is this article helpful?

You have already voted. Thanks :-) Your vote is saved :-) Processing your vote...

How do you measure validity and reliability of a test?

How are reliability and validity assessed? Reliability can be estimated by comparing different versions of the same measurement. Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory.

What are the three measures of validity?

Here we consider three basic kinds: face validity, content validity, and criterion validity.

What is validity test instrument?

Validity is often defined as the extent to which an instrument measures what it asserts to measure [Blumberg et al., 2005]. Validity of a research instrument assesses the extent to which the instrument measures what it is designed to measure (Robson, 2011). It is the degree to which the results are truthful.