Investigating Chinese Language Learners’ Reading Comprehension for Different Meaning Types

The Chinese government has published an official guide specifying the aims for reading in Chinese and the expected comprehension levels for different proficiency learners, with regard to teaching and learning Chinese as a second or foreign language. However, due to lack of teacher training for its implementation, this guide has rarely been used for teaching reading to Chinese language learners and has rarely been used for evaluating their reading ability. Therefore, it appears that teaching and assessing reading comprehension have not been based on a theoretical background of reading ability. The purpose of this study is twofold: (1) to provide validity evidence of a Chinese reading test developed based on a theoretical model of reading ability; and (2) to examine reading test performance of Chinese learners with various reading ability levels. For the purpose of this study, reading ability was defined based on a meaning-based model that included three layers of reading comprehension: literal, intended, and implied meanings of a reading text. A total of 248 Korean university students were divided into three levels, and their test performances were analyzed and compared for the three meaning types using structural equation modeling and regression analysis. The results suggest that the test performance structure represented the meaning-based model in general, thereby providing validity evidence of the test. Further analyses revealed that the three groups differed from one another with respect to their understanding of literal, intended, and implied meanings. The findings provide pedagogical implications for teaching Chinese language learners with different reading proficiency levels.


INTRODUCTION
The emergence of Communicative Language Teaching (CLT) and theoretical models of communicative competence have influenced not only teaching second or foreign languages, but also assessing language ability.Teachers have tried to focus more on fluency, purposeful/meaningful communication, and the use of authentic materials in language classrooms, rather than on the accurate usage of language, pattern drills, and practice.Emphasis on communicative language use has also brought about gradual changes in defining test constructs.Recently, constructs of reading ability have been defined for the evaluation of language learners" ability to understand the surface and underlying meaning of a writer"s message, instead of the evaluation of learners" ability to understand sentence structures or sentence-level meanings.
Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 Such a highlight on the communicative aspects in reading is also evident in the language education policy for Chinese language learners.The Chinese Language Proficiency Scales for Speakers of Other Languages, developed by the Office of Chinese Language Council International (2007), serve as criteria for Chinese teaching, learning and assessment, focusing on real-world language use.According to the scales, reading comprehension in Chinese is defined as including the comprehension of (i) correspondence in social interactions (e.g., a card of congratulations from a friend and private correspondence), (ii) instructive and explanatory texts (e.g., brief introductions to new books and popular science articles), and (iii) various kinds of informative texts (e.g., posters on a college campus and job advertisements).Such definitions reflect a concentration not only on literal meaning, but also on the pragmatic aspects of the Chinese language, unlike the Chinese language teaching and learning of the past (Zhou & Li, 2009).
The scales further define Chinese reading ability separately for different ability levels.The beginner level is defined as the ability to read and understand simple narrative or descriptive texts that are related to everyday life, grasping the main and concrete ideas.The intermediate level further includes the ability to understand the intention of the author, in addition to the requirements of the beginner level.The advanced level is defined as the ability to understand abstract, conceptual or technical information from the texts, read between the lines, and understand the author"s viewpoints or intentions.As such, different purposes of reading education and assessment have been established, with alternating lengths and types of reading texts and levels of meaning required for comprehension.
Even though reading ability has been described concretely in the scales, the definitions of different levels of reading ability have rarely been operationalized for reading assessment in China, as well as in other countries teaching Chinese as a second or foreign language, whether it be a high-stakes large-scale test or a classroom assessment.In addition, there seems to be little research that seeks to validate these reading tests (e.g., Kim & Park, 2013;Jeong, 2008;Seong, 2010) based on a theoretical model of reading ability (e.g., the Chinese Language Proficiency Scales for Speakers of Other Languages).
To this end, the present study aimed to (1) define the construct of Chinese reading ability based on a theoretical framework, and (2) investigate the extent to which Chinese learners of various proficiency levels understand different layers of meanings (i.e., literal, intended, and implied).The findings of this research are expected to suggest theoretically defined, and empirically operationalized reading ability for Chinese teaching and learning.Ultimately, a deeper understanding of reading ability will lead test developers and teachers not only to develop a reading test measuring learners" understanding of various layers/types of meanings beyond the literal meanings of texts, but will also lead them to make appropriate interpretations of learners" reading ability.

LITERATURE REVIEW DEFINING AND TESTING SECOND LANGUAGE (L2) READING ABILITY
There have been continuous attempts to define reading ability in L2 reading research.Traditionally, L2 reading ability has been defined in terms of its processes or products/components.Many researchers who focus on reading processes try to depict the mental activities involved in reading.Such mental activities are most commonly discussed using three processing models: top-down, bottom-up, and interactive models.The top-down model emphasizes the importance of activating existing schemata and the involvement of readers" knowledge in the reading process (Alderson, 2000).Readers bring meaning to the text based on their prior knowledge and experience, and are actively involved in creating meaning out of the text.Contrary to the top-down model, which emphasizes readers" active Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 role, the bottom-up model considers readers as passive decoders who process independent, sequential graphic, phonemic, syntactic, and semantic systems in order.According to Alderson (2000), bottom-up approaches are "serial models, where the reader begins with the printed word, recognizes graphic stimuli, decodes them to sound, recognizes words and decodes meanings" (p.16).Top-down and bottom-up models explain the reading process in different ways; however, neither processing model alone can fully explain the reading process.When individuals are engaged in reading, they selectively employ either the top-down or bottom-up process to comprehend meanings and compensate for deficiencies in the other.According to the interactive model, top-down and bottom-up approaches can occur either at the same time or alternately, depending on the reading texts, reader characteristics, and reading purposes (Alderson, 2000).
The other attempt to define reading ability focuses on its products or components.Since reading ability is regarded as divisible, many researchers have tried to identify the separate elements involved in reading.This view has not only encouraged researchers to propose taxonomies of reading skills to define what it means to be able to read (e.g., Carroll, 1993;Grabe, 1991), but it has also provided implications for L2 reading assessment.Weir (1997) argues that "if specific skills, components or strategies could be clearly identified as making an important contribution to the reading process, then it would of course be at least possible […] to test these and to use the composite results for reporting on the reading proficiency" (p.44).Oftentimes, sub-skills identified under reading ability (e.g., identifying the main idea, understanding details, and inferencing) have served as operational definitions of reading ability, especially for large-scale multiple-choice tests (e.g., Alderson, 1990;Lumley, 1993).
Since neither attempt can fully explain reading performance without the other, researchers have recently proposed a new approach to define reading ability, incorporating both the process and product of reading.Within a broader framework of language knowledge (Purpura, 2004), Liao (2008) and Kim, A.Y. (2011) define reading ability in terms of the different types of meanings obtained from the text.Purpura"s (2004) language knowledge is divided into (1) grammatical knowledge, including grammatical form (various linguistic forms) and grammatical meaning (literal and intended meaning of utterances), and (2) pragmatic knowledge, including contextual, sociolinguistic, sociocultural, psychological, and rhetorical meanings.While explaining language knowledge, Purpura (2004) differentiates three different types of meanings involved in language ability: literal, intended, and pragmatic/implied meanings.Kim, A.Y. (2011) adopted Purpura"s (2004) meaning-based framework and operationalized L2 reading ability in terms of three types of meanings (literal, intended, and implied).That is, literal meaning, which requires understanding the surfacelevel meaning derived from a text, is differentiated from the other two types of intended and pragmatic meanings, which require some sort of inferencing.Intended and pragmatic meanings are distinguished based on whether inferencing is made within the text (intended) or outside/beyond the text (pragmatic/implied).Kim, A.Y. (2011) argues that this new approach to define reading ability better explains L2 reading than earlier attempts, since the primary purpose of reading is to understand meanings that are derived in various ways from texts.Therefore, the three types of meanings are essential in understanding and testing individuals" reading ability.
However, there have been only few attempts to define and test reading ability based on such a meaning-based model, especially in a second or foreign language other than English.Specifically, in the case of Chinese reading assessment, the Chinese Language Proficiency Scales (2007) define reading ability by focusing on different types of meanings that learners are expected to understand at different proficiency levels.Even though the definitions are very similar to the three types of meanings specified in Purpura"s (2004) model, it is unclear whether or not these definitions came from any theoretical framework of reading ability or proficiency.Moreover, reading ability has not been operationalized for Chinese reading test development.Therefore, the present study aims to provide empirical evidence of the construct validity of a Chinese as a foreign language (CFL) reading test, which has been developed based on Purpura"s (2004) meaning-based model.In addition, this study aims to explore the nature of CFL students" reading ability at different levels by investigating the extent to which they understand different types of meanings while reading.

TESTING THE READING ABILITY OF CFL LEARNERS
For teaching and learning Chinese reading, extensive (top-down model) and intensive (bottom-up model) reading approaches have widely been used (e.g., Chen, 2010;Zhao, 2004;Zhou & Li, 2009).As Chinese language teaching at the secondary level puts equal weight on all four skills of listening, speaking, reading and writing and emphasizes the integration of language skills, integrated-skills tests are the most common type of assessment.On the other hand, in higher education the four skills are often taught separately as independent courses (Zhou & Li, 2009).Therefore, Chinese language courses at the level of higher education often use independent-skills tests (e.g., achievement and diagnostic tests) specified for each literacy area (e.g., reading).However, it seems at present that there is very little research on how to test independent language components or skills at the higher education level.
As the New Hanyu Shuiping Kaoshi (New HSK) is the most common and representative Chinese language test, most scholarly discussions on Chinese language assessment focus on assessment using the New HSK (e.g., Kim A.Y., 2011;Jeong, 2008;Seong, 2010).The New HSK is the most well-known Chinese language proficiency test developed based on the Chinese Language Proficiency Scales for Speakers of Other Languages, and is utilized worldwide.While it is the most renowned test, most research on the New HSK seems to concentrate on the test format rather than on the test construct.This is also the case for research on the reading comprehension tests under the New HSK.For example, previous research has addressed such issues as evaluation of the testing system (Jeong, 2008;Seong, 2010) and comparisons between the New HSK and other Chinese language tests (Kim M. S., 2011).It is difficult, however, to find research on test constructs or post-assessment feedback.In particular, studies on testing reading ability based on the meaning-based model are not yet to be seen.Therefore, research that involves defining the test construct of reading ability and then conducting an assessment is deemed to have significant academic value.
Because the New HSK test items are not accessible to the public, most studies on the New HSK (Kim M. S., 2011;Jeong, 2008;Seong, 2010) have been based on the Chinese Language Proficiency Test Syllabus Levels 1 to 6 (The Office of Chinese Language Council International & Confucius Institute Headquarters, 2009a, 2009b, 2009c, 2009d, 2009e, 2009f), which provide an overview of the content of the New HSK.According to the syllabus, test items in the reading comprehension section include a broad array of topics such as daily life, politics, economics, society, and culture.However, the questions center mostly on meaning, and in particular, literal meaning rather than on intended meaning.In addition, there are not many pragmatics-related items, thereby resulting in a limited number of items on implied meaning (Kim & Park, 2013).Such being the situation, the test construct was defined based on the meaning-based model, and test items were newly developed to include all three types of meanings, while reading texts were directly cited from the Chinese Language Proficiency Test Syllabus.

RESEARCH PURPOSE
To reiterate, the present research aimed to examine the extent to which Chinese language learners of different reading ability levels can understand a variety of meanings through a reading test which was developed based on the operationalization of a construct definition of reading ability (Purpura, 2004).The research also investigated the differences in test performance based on different levels of reading ability.The study sought to answer the following research questions: 1.What is the underlying trait structure of foreign language test performance as measured by the Chinese reading test? 2. Do the three groups of CFL learners (beginner, intermediate, and advanced) exhibit differences in their scores on the components of reading ability?

METHODOLOGY PARTICIPANTS
The study was carried out with 248 Korean university students as participants.All participants were attending a Chinese language course at a university located in Seoul.
Initially, college students who had Chinese-related majors (e.g., Chinese linguistics and literature, Chinese regional studies, and Chinese translation and interpretation) were invited to volunteer to participate in the current study.Among the students who agreed to participate, only students speaking Korean as their first language were selected, and native Chinesespeaking students were not included.The participants ranged from first year to fourth year at the university, and thus reflected multiple levels of Chinese language proficiency.Consequently, the participants included 91 students at the beginning reading level, 110 at the intermediate level, and 47 at the advanced level.(More information about group classification and a detailed explanation of such reading levels are provided in the Procedures section.)Five Chinese language experts participated in setting the criteria to determine the threshold scores that distinguished the students" reading levels (beginning, intermediate and advanced).These experts included two current Chinese language adjunct instructors, one HSK instructor, one doctoral candidate specializing in Chinese language education, and one university faculty member with a doctoral degree in Chinese language education.All five experts were females in their 30s to 50s.They had two to ten years of experience in teaching Chinese (e.g., conversation, HSK preparation courses, and other test preparation courses).

INSTRUMENTS READING TEXT
The reading texts were taken from texts found in the Chinese Language Proficiency Test Syllabus.The New HSK is comprised of six levels, with the lowest being level 1.The basic levels of 1 to 3 were excluded from the range to be covered in the present test, while level 4 for beginners, level 5 for intermediate learners, and level 6 for advanced learners were all included.The basic levels were excluded because they focus on vocabulary and grammar instead of reading itself, considering low-beginner learners" limited language ability.For the reading texts, two short texts (1-2 sentences) and two lengthy texts (1 paragraph) were chosen from the beginning level 4, two short texts (2-4 sentences) and two lengthy texts (2-3 paragraphs) were chosen from the intermediate level 5, and two lengthy texts (3-4 paragraphs) were chosen from the advanced level 6.The reading texts included a variety of topics, such as finding friendship in hardship, the path to a good life, balancing money and time, and a great man"s wisdom.The length of the text and the difficulty of the vocabulary Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 and grammar were considered in selecting the excerpts.One of the reading texts was shortened in order to develop test items using the deleted part of the text.The texts and accompanying test items are presented in the Appendix.

READING ITEMS
The construct definition of reading ability drew on Kim (2011) L2 reading ability, adapted from Purpura ( 2004)"s language ability model.That is, the assessment goal for the reading items was set at understanding literal, intended, and implied meanings.A total of 20 items were developed based on such assessment goals.The number of items was determined for practicality reasons, and the items were pilot tested in advance.All 20 questions were coupled with discrete-point multiple-choice items.
The test was composed of a total of 20 items, including seven on understanding literal meaning, eight on understanding intended meaning, and five on understanding implied meaning.Specifically, the test included six beginner-level items (three literal and three intended meaning questions), seven intermediate-level items (two literal, three intended, and two implied meaning questions), and seven advanced-level items (two literal, two intended, and three implied meaning questions).The item types are represented in Table 1.Examples of test items include "Which of the following is correct according to the passage?" (literal meaning); "What is the author's intention?"(intended meaning); and "What phrase is most appropriate in the blank?" (implied meaning).For the sake of designing questions based on the test construct, 16 out of the 20 items cited questions from the New HSK syllabus, while the other four items were custom designed by the researchers.The syllabus rarely included intended and implied meaning items; as a result, some items had to be newly developed in order to ask all three types of meaning on the test.In the first stage, the Angoff procedure (Angoff, 1971) was designed for the multiplechoice reading test, which is used in the testing process to systematically distinguish testtakers" achievement/performance levels.It was used to determine the cut-off point for each level (beginning, intermediate, and advanced).A cut-off point to distinguish between the beginning and intermediate levels was established as follows.First, the five experts met to assess their understanding of each level.For each of the 20 items on the test, each member wrote down the prospective probability that students with a minimal competence level in the intermediate group would correctly answer each item.Then the five members compared one Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 another"s numbers reflecting the probability in order to identify the items that demonstrated the biggest difference.Referring to opinions collected, probabilities were written down again.Then the probability scores were added up and divided by 20, which is the total number of test questions.Once an average was produced after collecting the results from all five members, the cut score between the beginning and intermediate levels was established.Following the same procedures, the probability for the minimal level students in the advanced level to correctly answer each item was determined in order to establish the cut score distinguishing between the intermediate and advanced levels.As a result of the Angoff procedure, the cut-off point between the beginning and intermediate levels was a score of 11 (out of 20), and the cut-off point between the intermediate and advanced levels was a score of 16 (out of 20).
In the second stage, the 248 participants took the reading test.The students were selected by convenience sampling.The participants who were asked to volunteer to participate in the present study were taking Chinese language courses at a university where one of the Chinese language experts (university faculty member) was teaching.The test was conducted in the 11 Chinese language courses for thirty minutes.The third stage involved scoring all 248 tests collected from the participants.One point was given to each correct item while 0 point was assigned to each incorrect item.Thus, the maximum score was 20 and the minimum was zero.At the fourth stage, the participants" scores were divided into three groups (beginner, intermediate and advanced) based on the cut-off points (11 between beginning and intermediate, 16 between intermediate and advanced) produced from the Angoff procedure.Accordingly, 91 students who received a score between 1 and 10 were identified as being at the beginning level; 110 students with scores between 11 and 15 were deemed as being at the intermediate level; and 47 students with scores between 16 and 20 were deemed as being at the advanced level.

ANALYSIS
Descriptive statistics of the test scores (mean, standard deviation, minimum and maximum scores, and skewness and kurtosis) were calculated to obtain information about the central tendency, dispersion, and shape of the distribution of the 248 examinees" test scores.Descriptive statistics were also calculated at the group level (beginner, intermediate, and advanced) to compare the reading performance at the three different levels.After screening the overall picture of the test scores, Confirmatory Factor Analyses (CFAs) were performed using EQS version 6.1 (Bentler & Wu, 2005) to examine the adequacy of the theoretical model of reading ability and to determine the underlying structure of test performance, as measured by the 20 reading items.CFAs are often used in validation studies because they evaluate an overall model, as well as individual parameters specified in the model.In order to evaluate the fit of the model, fit indices were examined, such as the Chi-square statistic, the Chi-square/df ratio, the comparative fit index (CFI), and the root mean-square error of approximation (RMSEA).Then, the statistical significance of the parameter estimates was checked for evaluation regarding the fit of the individual parameters.
After assessing the structure of the reading test, an Analysis of Variance (ANOVA) was used to examine whether the three groups of examinees (beginner, intermediate, and advanced) exhibited differences in each reading test component (i.e., literal, intended, and implied meaning).After confirming group differences in the reading performance for each reading test component, a stepwise regression analysis was performed to explain the proportion of variance (overall reading test scores) that could be predicted by the three test components.The R-square statistics were calculated to explain the test components that represented the test performance of each group and their contributions (Song, 2005).).Among the three groups, the advanced examinees showed the least variability in their scores, while the beginner examinees had the largest variability.The internal consistency reliability for the twenty item test was also estimated using Cronbach"s alpha.The coefficient alpha was 0.83, which suggests that the twenty items were measuring the same construct (reading ability) to a moderate degree.

RESULTS OF CONFIRMATORY FACTOR ANALYSIS
In order to examine whether the hypothesized constructs of reading ability (literal, intended, and implied) would function as intended, a series of CFAs were conducted, and the final underlying structure of test performance was obtained.While testing the possible assumptions of the structure, items 2 and 3 were deleted from the analyses because the loadings of these two items were not statistically significant (test statistic < ±1.96).Items 2 and 3 were supposed to load on the literal and intended meaning factors, respectively, for substantive reasons; however, they did not meaningfully contribute to the model, and were thus, deleted from the model.The remaining 18 reading items loaded on the three trait factors, and the test structure followed a representation of the reading ability model.As seen in Figure 1, six items (1, 5, 7, 11, 15, and 18) loaded on literal meaning; eight items (4, 6, 8, 9, 12, 13, 16, and 19) loaded on intended meaning; and four items (10, 14, 17, and 20) loaded on the implied meaning factor.The items loaded on each factor as expected.The three trait factors (literal, intended, and implied meaning) were correlated with one another to a high degree (0.86 to 0.89).To assess the model as a whole, goodness-of-fit statistics were calculated.The independence model Chi-square statistic was 1,201.58 with 153 degrees of freedom (p < 0.0001), suggesting that the data did not fit the hypothesized model.However, Chi-square statistics are known to be sensitive to sample size (Byrne, 2006); thus, other fit indices were further examined.The CFI was reported as 0.903, and the RMSEA was 0.058, with a confidence interval of 0.046 and 0.069.The CFI was greater than 0.09, which indicated a well-fitting model.RMSEA values less than 0.05 are normally considered a good fit; however, values as high as 0.08 are acceptable as reasonable errors of approximation (Browne & Cudeck, 1993;Byrne, 2006).Therefore, overall, the reading performance data fit the model adequately as a whole, but not as well as anticipated.
After assessing the model fit, individual parameter estimates were also assessed.The unstandardized estimates showed that all parameter estimates (factor loadings, covariances, and error variances) were statistically significant, suggesting that they contributed to the model as important elements.Standardized parameter estimates revealed that factor loadings ranged from 0.19 to 0.79, while error loadings were estimated at approximately 0.75.The results suggest that the contribution of errors to the variables was not negligible, and that the variables could not mainly be explained only by the factors.In other words, variables other than the three factors (e.g., test method) should be considered along with the three factors (albeit unexpected) when explaining the examinees" test performance on the 18 reading items.

RESULTS OF GROUP COMPARISONS
After examining the test structure of the 248 examinees" reading performance, the differences among the three groups" (beginner, intermediate, and advanced) test scores were further analyzed.In order to compare the group differences for each of the three types of reading items (literal, intended, and implied meaning items), ANOVAs were used.First, the assumptions of ANOVA were tested, including the independence of observations, homogeneity of variances, and normality.
The results indicated that the mean scores of the three groups were significantly Since each of the three ANOVA results indicated a significant difference among the three groups, a post hoc test (Tukey"s HSD test) was performed to further analyze which examinee groups differed in their reading performance.The results indicated that all three examinee groups showed a significant difference from one another for each of the three item types.Therefore, it is evident that the three groups were not equivalent with respect to their ability to comprehend literal, intended, and implied meaning.The three groups" test performances are presented for each reading item types in the following graphs.In order to further explain the group differences, a stepwise regression analysis was performed for each examinee group.Tables 3, 4, and 5 present the results of the regression analysis.Only the predictors that had a significant effect on the dependent variable (total test scores) were included in the model summary.As a result, unequal numbers of models are presented for different examinee groups.Beginner examinees" test scores were mainly explained by the literal and intended meaning test items.The literal meaning items alone explained 74% of the total score variance, while the literal and intended meaning items explained 93% of the variance.That is, most of the score variance was explained by the first two predictors, whereas the implied meaning items contributed almost nothing to predicting beginner examinees" test performance.Since the effects of the implied meaning items were not significant, implied meaning was not included as a predictor in the model.Similar to the intermediate examinees" model, all three factors together explained the total score variance of the advanced examinees.However, the literal and intended meaning items could explain only about half of the score variance (49%).Therefore, the role of the implied meaning items was more important in the advanced group in predicting the advanced examinees" test performance, compared to the intermediate group.

STRUCTURE OF THE CHINESE READING TEST
The results of the CFA show that the reading test was structured according to the three types of reading (literal, intended, and implied meaning comprehension).This underlying test structure represents Purpura"s ( 2004) meaning-based model of language knowledge.Other studies involving L2 reading ability derived from Purpura"s model have usually had two underlying factors.For example, Liao (2008) operationalized L2 reading ability variables in terms of literal meaning (understanding explicitly stated information) and pragmatic meaning (understanding implicit information).Similarly, Kim A. Y. (2011) explained reading ability with semantic (literal and intended) and pragmatic (implied) meaning.Contrary to these previous studies, the test constructs of the present study were defined in terms of the three separate types of meaning (literal, intended, and implied) that Purpura (2004) originally differentiated.Also, the test items were developed to measure examinee ability so as to understand the three different types of meaning.Therefore, the CFA results provided empirical evidence of construct validity in the present study, as well as in Purpura"s (2004) theoretical model of language knowledge.
While the overall fit indices and significance of all parameters confirmed that the test items measured the underlying constructs of reading ability as intended, a few compromises had to be made during the model-building process.First of all, the two insignificant items (Items 2 and 3) were deleted from the analysis.Test item number 2 (literal meaning item) provided a short paragraph in which test-takers had to choose among four options the one concurring with the content of the text.However, the phrase "拒绝别人" (refuse other people), which was one of the options, also appeared in the text; thus, participants apparently tended to choose this option because of its similarity with the text, not because they understood the literal meaning of the phrase.Test item number 3 (intended meaning item) required testtakers to read three sentences and then put them in the correct order.However, this sequencing could be accomplished only by understanding the first word of each sentence, rather than by truly assessing meaning between the lines.Since the two items failed to measure participants" true understanding of the literal and intended meaning as intended, the loadings of these two items might not have been statistically significant.The other problem of the model-building process involved the relatively large contribution of error loadings, as discussed earlier.Because of these error contributions, it was difficult to conclude that the test performance was mainly explained by the underlying factors of reading ability.That is, something else was involved in the test performance other than the ability to understand literal, intended, and implied meaning.Possible reasons for this problem may be found in the test development and administration procedures.For practical reasons, only 20 items were developed, and two of these 20 were deleted during the analysis.Therefore, the number of test items was not sufficient to adequately measure the three trait factors.For example, only four items were intended to measure the participants" ability to understand implied meaning.Moreover, the sample sizes for the three different examinee groups (beginner, intermediate, and advanced) were not balanced.While the beginner and intermediate groups had 91 and 110 examinees, respectively, the advanced group had only 47 examinees.Due to these two main reasons, a few compromises were inevitable in structuring the test performance.As a result, the underlying trait structure could only partially explain participants" test performance.

GROUP COMPARISONS OF TEST PERFORMANCE
In each of the reading item types, the three groups of examinees (beginner, intermediate, and advanced) showed significant differences in their performance.At the beginning of the study, cut-scores were determined based on the Angoff (1971) method.These cut-scores were then used to create three groups from the test scores (beginner, intermediate, and advanced).We then examined the three groups" performance on the sets of items that assessed their ability to understand literal, intended, and implied meaning.According to the ANOVA results, the reading performances of all three groups were not the same across the three types of meaning items.
Further regression analyses explained how the three groups differed in their reading ability.The beginner examinees seemed to mainly understand literal and intended meaning.Most of their reading performance (93% of the variance) was based on these two types.With only the literal meaning predictor, 74% of the score variance was explained.Therefore, it can be concluded that beginner examinees could comprehend explicitly stated literal meaning from the text, while they had difficulty inferring meaning from the text or outside of the text.However, this finding is not surprising because there were no items created specifically for the beginning level that assessed implied meaning.Participants in the beginner level might have guessed answers for the implied meaning items, which were too difficult for them, and thus, the variance became random.Assessing implied meaning might only be possible at higher levels, but the absence of implied meaning items targeting the beginning level makes it difficult to conclude that beginner-level learners entirely lack the ability to understand implied meaning.
The intermediate examinees appeared to handle all three types of meaning items, as the effects of all three predictors were statistically significant and were included in the model.For the intermediate level, two literal, three intended, and two implied meaning items were developed, and each type explained 29%, 39%, and 31% of the score variance, respectively.Since all three types of meanings were treated as equally important on the test, the intermediate examinee performance results correspond to the predicted amount of score variance.Thus, this group of examinees could understand meaning that was explicitly and implicitly stated in the text and could make inferences about text meaning using their background knowledge at their reading ability level.The last group, advanced examinees, appeared to have the ability to understand any type of reading item.Their reading performance was minimally or partly explained only by the literal meaning predictor (26%, Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 see model 1 in Table 5) or by the literal and intended meaning predictors (49%, see model 2 in Table 5).That is, the advanced examinees" reading performance could not be fully explained without any of the three types of items.As seen in the regression analyses, the three examinee groups showed distinct performance patterns with respect to the three different types of meaning items.
In foreign language classes in Korea, reading is often taught by two approaches: extensive reading, focusing on the top-down model, and intensive reading, focusing on the bottom-up model (Chen, 2010;Zhao, 2004;Zhou & Li, 2009).Chinese reading at the beginning level most commonly involves intensive reading (Kim & Park, 2013).At this level, reading comprehension focuses on sounds, letters, words, and grammatical rules (Zhou & Li, 2009).Therefore, the general focus is on understanding the surface-level meaning attached to the text.As the present research reports, while beginning-level participants were likely to get high scores on items asking for literal meanings, they were less likely to get positive scores on items asking for implied meanings.Intermediate-level Chinese teaching and learning mostly emphasize intensive reading with some extensive reading (Kim & Park, 2013).Teaching focuses on having students establish their own reading strategies, including skimming and scanning, searching keywords and finding the gists of texts.Students are guided to understand not only the literal meaning, but also the interpretive level meaning attached to the text.Therefore, at the intermediate level, participants were found to score higher on items asking for intended meanings.At the most advanced level of Chinese teaching and learning, most reading concentrates on extensive reading involving a wide variety of texts (Kim & Park, 2013).Learners are encouraged to go beyond understanding information given in the text, and to seek the implied-level meaning attached to the text, aiming to understand the author"s intentions.Therefore, advanced-level learners may have been able to enhance their skills in comprehending not only literal and intended meanings, but also implied meaning.The advanced-level learners obtained positive results in all sections, including items related to intended and implied meanings.

CONCLUSION AND IMPLICATIONS
The current study examined the underlying trait structure of the reading items developed based on the Purpura"s (2004) language ability model.Overall, the test performance structure represented the theoretical model, and the items appeared to measure literal, intended, and implied meaning, as intended.A further analysis of group comparisons revealed different performance patterns across the beginner, intermediate and advanced examinee groups.Lower-level examinees" reading performance largely depended on literal meaning items, while higher-level examinees exhibited a more balanced understanding of literal, intended, and implied meaning from the text.As argued above, however, this finding might be due to the test item types created for each level.
The present research has contributed to enhancing the validity of Chinese reading proficiency assessment tools, with the aim of helping them more accurately assess learners" Chinese reading ability by presenting a theoretical foundation for developing testing items.Zhao (2004) points out that teaching and learning Chinese reading is still quite slanted toward acquiring linguistic knowledge, including vocabulary and grammar.Such teaching and learning methods may be conducive to cultivating an understanding of literal meaning, but may minimally help learners foster their comprehension of intended meanings that require reading between the lines.It is even more difficult to expect such methods to help learners understand implied meanings that require reading beyond the lines.In addition, most Chinese reading assessment has not attempted to measure learners" understanding of different types of meaning, including the New HSK, which is currently the most representative Chinese Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 proficiency test available.A test is used not only as a tool to determine the efficiency of language teaching and learning, but also as a guide and stimulant for finding future directions for further teaching and learning practices (washback effects) (Sadeghi & Nikou, 2012;Salehi & Yunus, 2012;Zhao, 2004).The present research suggests a direction for future teaching and learning in Chinese reading so that Chinese language learners can equip themselves with balanced skills in reading comprehension, and eventually will be able to actively exchange and share information with others through Chinese language.

LIMITATIONS AND SUGGESTIONS FOR FURTHER RESEARCH
As briefly mentioned earlier, the present study has a number of limitations.Due to the small sample size and unbalanced number of examinees for the three examinee groups, a more sophisticated statistical analysis (e.g., multi-group analysis and CFA for each examinee group) was not performed.The lack of reading items may also have made it difficult to explain the main effects of each item developed to elicit examinee ability to comprehend the different types of meaning.Therefore, a larger sample size and a more careful selection of examinees representing a wide range of reading ability may better explain Chinese learners" strengths and weaknesses in their reading, and may further inform how such strengths and weaknesses are different across ability levels.Beyond the statistical analyses, additional qualitative analyses of examinee performance, such as think-aloud protocols and interviews, would also contribute to a better understanding of learners" reading ability.
Not only in China, but also in countries where Chinese is taught as a second or foreign language, including Korea, research on Chinese testing and evaluation is quite small in number, compared to research on Chinese language teaching and learning.This is even more so with research on Chinese reading ability tests.So far, Chinese proficiency tests have been functioning as a tool to measure the efficiency of Chinese teaching and learning.As tests have been advancing to the forefront of Chinese language instruction as a guide and impetus for Chinese teaching, it is strongly anticipated that research on Chinese reading ability tests will gain momentum and will become increasingly important in the near future.Volume 14(1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 Volume 14(1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)

(2009d
Choose the correct answer.
1. My brother used to be shorter than me when we were young, but now he has grown taller.His height is 182 cm, which makes me feel jealous.
★Based on the statement above, the current situation is that: A) I am 180 cm tall.B) I am shorter than my brother.C) My brother"s height is short.D) I feel sympathy for him.
★The answer that summarizes this paragraph is: A) How to reject others B) How to get respect C) How to prevent misunderstanding D) How to ask for forgiveness 3-4.List the answers in the right order.
3. (1) She has left me a strong impression.
(2) It is because she is very passionate and polite.
(3) I met Miss Wang for the first time. 4.
(1) This type of fish live deep in the sea.
(2) They look like a group of swimming light bulbs.
(3) They have a luminous body.
Choose the correct answer.What does "a true friend" mean?Everyone has a different thought.In my opinion, a true friend is courageous enough to help you when you face a difficulty; a true friend stays with you to make you happy when you feel lonely or devastated; a true friend is always dependable whether you are rich or poor.
5. According to the passage, "a true friend": A) Shares experience B) Cares for your family C) Pulls you away from danger D) Leads you through hard times 6.The main topic of this passage is Choose the best answer that corresponds with the given passage.
7. My school has been holding an annual speech contest since 1995.This year is the 15 th time the contest is held.This year"s contest is on Saturday and I am certain that I will receive a good result.I will outperform with a high level of skillset.Therefore, you should wait to hear my good news.
A) The contest is scheduled on Saturday.B) I participate in this contest every year.C) I am confident about this contest.
D) The level of this contest is not too high.
8. Sometimes, a storyteller purposely goes silent for a moment when people are absorbed in the story.The purpose of this behavior is to arouse curiosity from the audience and make them listen more carefully.In order to create a more pleasant atmosphere, the storyteller uses the short breaks to observe the listeners" reaction.Chinese people call this way of speaking, "intended pausing." A) Intended pausing attracts the audience"s notice.
B) The storyteller has no curiosity.C) When telling a story, one should not go silent at any time.
D) The audience is favorable of intended pausing.9-10.Choose the correct answer.My friend purchased a new car.Last weekend, he and I went out for a test drive.In order to test the car"s performance, we drove fast.My friend said in excitement, "Although this car is not so famous, its speed is as fast as luxury cars."At that moment, the car in front of us suddenly stopped, so my friend had to step on the brake abruptly.Our car could finally stop after skidding for a long time.As we could have almost crashed into another car, we were very terrified."Now I know the difference between regular and luxury cars," said he.
In fact, both regular and luxury cars can run in high speed, but there is a big gap in stopping the car.Luxury cars usually stop more quickly than regular cars.This gap also applies to life.Smart people not only work efficiently but also know how to stop promptly.The best solution for a situation that seems to have no future is to stop as quickly as possible.9.According to the passage, the author considers the main difference between smart people and regular people to be: Chao Cao wished to know how much the elephant he recently received weighed.His officials brought forward various opinions.One official suggested producing a huge scale, but it was impossible to produce such scale that was as big as the elephant.Another offered to cut the elephant into pieces and measure its weight; however, there was no meaning of measuring the elephant"s weight if it died.Though many people proposed a diversity of ideas, Chao Cao was not satisfied.
Then, Chao Cao"s youngest son, Chao Chong, said, "Father, I know how to measure the elephant"s weight."As Chao Chong explained the method, (1) Chao Cao listened to him, (2) gave orders to prepare for the measurement, (3) gathered crowd to watch, (4).
People went by the river and there was a ship floating.Chao Chong gave orders to put the elephant on the ship, to wait until the ship stays still, and to draw a line on the side of the ship that meets horizontally with the water.Chao Chong told people to take the elephant off the ship and to fill the ship with rocks of various sizes.Then, the ship began to sink little by little.As the surface of water became aligned with the line drawn on the side of the ship, Chao Chong stopped people from filling the boat with rocks.Finally, the officials had eyes wide Volume 14 (1), February 2014(http://dx.doi.org/10.17576/GEMA-2014-1401-06)ISSN: 1675-8021 with amazement to realize what had happened.They were marveled, saying "What a great method!What a great method!"After all, everyone knew that the weight of rocks that filled the ship was equal to the weight of the elephant.Chao Cao looked at the officials, ( a Most people always complain about shortage of money.Sociologists discovered the fact that when people actually have money, they tend to complain about shortage of time.As seen through several examples, the more money one has, the less time he has; the poorer one is or if one has no job, the more painful he is with boredom.
People pursue wealth in order to live a better life.However, once people gain wealth, they tend to become busier and eventually cannot live a better life.
When people confront shortage of money, many think, "If I have enough money, I would do …."In people"s minds, the term "wealthy" represents freedom, independence, doingwhatever-they-wantfor example, sunbathing by the shore in summer and going for mountain ski in winter.
However, when people are really wealthy, they realize that their plans cannot come true.The one and only reason is: "There is no time!"Moreover, people with high income are workaholic.
In conclusion, being wealthy and having enough time cannot be achieved at the same time.Thus, people say, "When you are young and poor, you wish to earn money with time.But, even if you have money in the future, you cannot buy time with money."A young man had a job related to marketing.Although he worked hard for half a year, there was no successful achievement.However, each of his colleagues was comparatively successful.The man could not stand the pain of failure.At his boss"s office, the man told him with shame that he did not fit into the job."Do not mind other factors when you work because there is enough time until you can finally succeed.If you want to leave by then, I will let you go."The generosity of his boss touched the man"s heart.The man thought that he would leave after achieving at least one successful task.
One year after, the man went into the boss"s office.This time, he walked in with a light heart as he had been on top of the sales chart for seven consecutive months.In fact, the job was fit to the man.He wanted to know the reason why his boss did not fire him in the past.
"It"s because I would have felt more resentful if I let you go back then."The boss"s response was unexpected.The boss explained, "I received over 100 applications when the company was recruiting, interviewed about 20 applicants, and finally hired you.If I had let you go, it would mean a big failure in my career as well.I firmly believe that you could be acknowledged by the customers as I already acknowledged you and that you just did not have enough time and chance.Actually, I rather had trust in myself than in you." This is the story of a young man, and I AM the young man.
18.The reason why the boss did not let the man go is because A) the company urgently needed employees.B) customers favored the young man.C) the boss believed that his decision was right.D) the young man had various work experiences.
19.According to the passage, the following statements are true EXCEPT A) The competition rate was very high.
B) The recruiting session is consist of only one step.
C) The company announces each employee"s achievement every month.D) The customers" feedback is related to the employees" achievements.

The main topic of this passage is
A) a good leader guides the company to success.B) success is followed by the support of majority.C) one needs to learn how to manage relationships with people at work.D) confidence and generosity creates a good outcome.

ABOUT THE AUTHORS
Mi Soon Kim earned a Ph.D. in Linguistics and Applied Linguistics from Beijing Language Culture University.She is an assistant professor of Chinese education in the Graduate School of Education at Hankuk University of Foreign Studies, Korea.Her research interests include teaching Chinese as a foreign language and grammar studies.
Hyun Jung Kim earned an Ed.D. in Applied Linguistics from Teachers College, Columbia University.She is an assistant professor of the Graduate School of TESOL at Hankuk University of Foreign Studies, Korea.Her research interests include language assessment and language test validation.
FIGURE 2. FIGURE 3. FIGURE 4. Mean score difference Mean score difference Mean score difference for literal meaning for intended meaning for implied meaning

A
author mentions his own experience in the first paragraph in order to: A) Explain his opinion through exemplification.B) Give a typical example through exemplification.C) Propose his opinion through metaphor.D) Disprove his opinion through metaphor.11-14.Choose the correct answer.
): "Not any one of you is as smart as my youngest son." 11.Chao Chong made use of the following in weighing the elephant, EXCEPT Amain topic of this passage is A) Weight of the elephant B) The decision of Chao Cao C) Science of the Last Han Dynasty D) Wisdom of Chao Chong 13.The phrase "let out exclamations" can be best placed in A.Choose the correct answer.
15. What do people think when they have no money?A) Make plans for when they have enough money B) How to buy time with money C) How to earn more money D) Imagine how a busy life is like 16.The main topic of this passage is A) The rich and the poor B) Time and money C) Ideal and Reality D) People with enough time and people with lack of time 17.According to the passage, people with shortage of money think that A) work and time can be attained at the same time.B) the purpose of earning money should be clear.C) not having enough money is dissatisfactory.D) making dreams come true is followed by sacrifice.18-21.Choose the correct answer.

TABLE 1 .
Item types by proficiency level PROCEDURESData collection occurred in four stages: (1) The cut-off point for the score was established for reading ability level distinction; (2) The reading test was conducted; (3) The participants" responses were scored; and (4) Reading ability levels were identified based on test scores.

TABLE 2 .
Descriptive statistics of the test scores were computed for each group of examinees (beginner, intermediate, and advanced) and for the entire group of examinees.The results of the descriptive statistics are presented in Table2.Descriptive statistics for each group and entire examinees The preliminary test results indicate that overall, the test had an appropriate difficulty level for these learners, resulting in a normal distribution of scores.The distributions of scores were approximately normal for each examinee group and for the entire group of examinees.However, contrary to the beginner, intermediate, and entire group, the advanced group reported positive skewness (1.21) and kurtosis (0.57) values, indicating that more examinees in the advanced group received lower scores than the mean of the group (16.64

TABLE 3 .
Model summary for beginner examinees" reading test scores

TABLE 4 .
Model summary for intermediate examinees" reading test scoresContrary to the beginner examinees" model, all three test components had to be included as predictors to explain the intermediate examinees" test score variance.

TABLE 5 .
Model summary for advanced examinees" reading test scores ). Chinese language proficiency test syllabus: Level 4. Beijing: The Commercial Press.The Office of Chinese Language Council International & Confucius Institute Headquarters.(2009e).Chinese language proficiency test syllabus: Level 5.The Guidance of Teaching Chinese to speakers of other languages.Guangzhou: Sun Yat-sen University Press.