Report on the presentation at the International Meeting of the Psychometric Society (IMPS 2011):
"Setting a Target Test Information Function for Assembly of IRT-Based Classification Tests"

 This report describes the poster presentation entitled "Setting a Target Test Information Function for Assembly of IRT-Based Classification Tests," which was given at the International Meeting of the Psychometric Society (IMPS 2011) held in the Hong Kong Institute of Education, Hong Kong. More than 50 posters were presented in the session, and researchers in psychometrics engaged in active discussions with each other.

 Test assembly means to select an optimal set of items from an item pool so that the resulting test meets a certain criterion in terms of measurement accuracy. The measurement accuracy of a test based on item response theory (IRT) is represented by the test information function (TIF), and one needs to specify a target TIF for the test assembly. On the one hand, specification of the target TIF requires consideration of several factors, such as a desired level of measurement accuracy (i.e., standard error of estimation of ability) at each ability level, overall characteristics of items in the item pool, and simulation results. On the other hand, it would be useful if there was a systematic method to derive a target TIF from a small number of "conditions." The current study focused on the assembly of classification (i.e., pass/fail) tests, and proposed a method which numerically computes an optimal target TIF when the following two values are given: (a) an acceptable misclassification rate, which is the theoretical probability that an examinee with a true ability in the pass (or fail) level will be erroneously judged as fail (or pass), and (b) a threshold on the ability scale for pass/fail classifications.

 The problem was formulated in the framework of "statistical decision theory" in order to define an optimal TIF. Given a certain decision rule and loss function, the "risk function" for the pass/fail classification is considered as the misclassification rate conditional on ability. Its expectation with respect to the population distribution of ability is called the "Bayes risk" and in this case is equivalent to the overall misclassification rate. It must be noted that computation of these misclassification rates requires the TIF to be known. In usual decision theory one is concerned with looking for the best decision rule which minimizes the Bayes risk, whereas in the current problem we are interested in deriving a target TIF which keeps the Bayes risk below a certain value given the fixed decision rule.

 Computation of the Bayes risk involves complicated integration, and it is even harder to optimize the TIF contained in the Bayes risk. More manageable is the risk function (i.e., the conditional misclassification rate). If one assumes a certain functional form for the risk function, then the resulting Bayes risk has an analytical form. Then, the risk function can be uniquely determined if the threshold and the upper limit of the overall misclassification rate are given, and then it in turn uniquely determines the target TIF. TIFs obtained in this manner were plotted with varying values for the threshold and the overall misclassification rate.

 The results indicated that the target TIF became larger as (a) the threshold became closer to the population mean of ability and/or (b) the overall misclassification rate became smaller. Consider the case in which the overall misclassification rate is less than 10%. If the difference between the threshold and the population mean is 1, the maximum value of the obtained target TIF was as small as 3. However, if the difference is zero, the maximum value of the target TIF jumped up to around 16. This describes an advantage of the proposed method that one can obtain a standard for the target TIF only by specifying two values of the threshold and the overall misclassification rate.

 Counterintuitive, but interesting, results were also obtained. Since measurement (or classification) accuracy of examinees whose true ability is close to the threshold is inevitably low, it seems reasonable to set the TIF high, especially near the threshold. However, the results indicated the opposite; the obtained target TIFs all had a sudden fall at and near the threshold. This means that theory tells us to "give up that low accuracy."

 These results depend on the specific functional form assumed for the risk function. Other possible forms for this, together with other decision rules and loss functions, should be further considered, and the effects of these different configurations should be investigated.

pdf Download the presentation poster(380KB)
(Kentaro Kato, Ph.D., CRET Researcher)



Other Researcher


<< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | >>

Areas of Reasearch in CRET

This laboratory conducts research on test evaluation and analysis. We also perform joint research and exchange programs with overseas testing research institutes.

>> Click here
for the laboratory

This laboratory conducts research and development into testing approaches that measure communication skills, teamwork skills, and social skills, etc.

Dr. Atsushi Aikawa

Faculty of Human Sciences,
University of Tsukuba
Ph.D. in Psychology

>> Click here
for the laboratory

This laboratory conducts research on the foundation of computer-based testing, and basic research on media and recognition, as well as applied and practical research
that utilize such knowledge.

Dr. Kanji Akahori

Professor Emeritus of
Tokyo Institute of

>> Click here
for the laboratory