Q&A.png

Here are our answers to questions you asked/posed. For ease of access, they have been grouped by question themes below. Don’t hesitate to submit your own question(s) using this form. We will try our best to get back to you as soon as possible!

For additional Q&A, check out the NCES Frequently Asked Questions (FAQ).

If you do not see your question answered below, ask us! CLICK HERE to submit a question.


DATA ANALYSIS

Q1: Is there special software to use to analyze PIAAC data?

A: You can use SAS, Stata, SPSS programs, or an online tool (IDE) to analyze PIAAC data. The Organization for Economic Co-operation and Development (OECD), in collaboration with international partners, has developed SAS and Stata macros to incorporate PIAAC complex sampling and assessment designs. SPSS macros were developed by the Data Processing and Research Center of the International Association for the Evaluation of Educational Achievement (IEA-DPC) and are in a form of a free add on software called the IDB Analyzer that generates SPSS or SAS syntax for analysis. To do basic analysis, you can use the International Data Explorer (IDE), a user-friendly, online tool. The NCES IDE can be found here: http://nces.ed.gov/surveys/international/ide/ and the OECD IDE can be found here: http://piaacdataexplorer.oecd.org/ide/idepiaac/.

An explanation of the differences between the NCES IDE and OECD IDE can be found on the NCES IDE homepage. For more information on PIAAC data and to access the IDB Analyzer or SAS and Stata macros, go to: http://www.oecd.org/skills/piaac/data/.

Q2: Can I use SAS, Stata, or SPSS normally to analyze PIAAC data?

A: Instead of one proficiency score, PIAAC has 10 plausible values (PVs) that need to be combined in a certain way to come up with correct estimates and standard errors. Theoretically, one can look at the OECD Technical Report and come up with one’s own macro to estimate the proficiency levels and average scores or run regressions. However, to make it easier for researchers, the OECD has created software tools for analysis that take into account the complex sampling and assessment design of PIAAC. See our response to Q1 above for more information about these tools.

Q3: Do I need to use the macros at all times when dealing with PIAAC data?

A: Data management (i.e., combining multiple variables, creating an index) is done outside of the macros in the software of your choice. The analysis is divided into two parts – the survey (background questionnaire) and direct assessment. If one is analyzing the background questionnaire only, SAS and Stata programs can handle the full and replicate weights, so one can use the regular complex design survey procedures. SPSS complex sample design add-on allows one to specify cluster and/or stratification variables; however, it uses Taylor series linearization to produce the standard errors, while PIAAC data uses Jackknife repeated replication (JRR) or other forms of replication based estimation, depending on the specific country sampling design.

The analysis of the direct assessment requires the use of the OECD-provided macros – i.e., if one wants to estimate literacy of adults or regress literacy on other independent variables.

Q4: I am looking at the relationship between problem solving and educational attainment. I want to use the data from the PIAAC 2012 survey but it was difficult using the code book to find the scaled scores that are Below Level 1, at Level 1, at Level 2, and at Level 3 for problem solving. How was that coded?

A: There is not a variable on file that contains information on whether an individual’s score was Below Level 1, at Level 1, at Level 2, or at Level 3 for problem solving, because each individual is not assigned just one proficiency score or level. Each individual who takes the PIAAC assessment is given a set of 10 plausible values (imputed proficiency scores) ranging from 0-500, in order to account for the uncertainty associated with measures of skills in large-scale surveys and also to obtain more accurate estimates of group proficiency. These plausible values for problem solving are the variables PVPLS1 – PVPSL10 on the data file and in the code book. The proficiency levels are defined by score-point ranges and level of difficulty of the tasks within these ranges. The ranges for problem solving are defined as Below Level 1 (0-240 points), Level 1 (241-290), Level 2 (291-340), and Level 3 (341-500).

You will need to use special data analysis tools in order to correctly analyze PIAAC data, taking into account these plausible values and the complex sampling design of PIAAC. For most basic analysis, we recommend using the PIAAC International Data Explorer (IDE). The IDE is a user-friendly online tool that will allow you to conduct PIAAC analyses with the plausible values and sampling weights. You could, for example, use the IDE to do an analysis of the proficiency level distribution at different levels of education.

For more complex analysis with the data files, there is a program called the IEA International Database (IDB) Analyzer, which creates SPSS or SAS code. There are also SAS and STATA macros, which can be found at this link: http://www.oecd.org/skills/piaac/data/.

For more information on the PIAAC Data Files, please visit Datasets & Tools.

Q5: How can I calculate the number/share of people who were excluded from problem solving in technology-rich environments (PS-TRE) due to their lack of computer skills?

A: You can use the variable PSLSTATUS to determine the percentage of the population who have plausible values (PVs) for PS-TRE. The variable also shows the percentage of cases that have literacy-related non-response, and those that were excluded from PS-TRE for computer-based reasons (CBA Non-Response).

The variable PBROUTE will give more detailed information on the reasons why the respondent was excluded from the computer-based assessment and routed to the paper-based assessment (i.e., they failed a Core test of ICT skills, they refused the computer-based assessment, or they had answered a question indicating that they had no computer experience). For the U.S., there should be about 16% of the weighted sample who were excluded because of the lack of computer knowledge only. Note that this group does not include those that passed the Core test of ICT skills but failed the computer-based Core test of literacy and numeracy skills, and who were subsequently routed to the paper-based assessment.

The cases excluded for literacy-related non-response (about 4% of the U.S. weighted sample) are also flagged by domain corresponding variables called LITSTATUS and NUMSTATUS. These cases did not respond to the background questionnaire as a result of language difficulties or learning or mental disabilities and do not have plausible values for any of the domains.

Q6: How do I conduct analyses with the IALS and ALL data I received from Statistics Canada?

A: After you use the provided SPSS Syntax Files (.sps) to create the SPSS Data Document (.sav), you need to make a few edits to the produced data file before starting your analysis in the IEA International Database (IDB) Analyzer.

  • In both the IALS and ALL data files, you need to delete the leading 0 from Replic01 – Replic09 and rename them as Replic1 – Replic9.

  • In the IALS data file, you also need to rename the variable WEIGHT as POPWT.

After selecting the analysis file in the IDB Analyzer, when the “Select Study Type” option pops up, please choose IALS/ALLS. You then need to select IALS/ALLS(Rescaled data - 2013) in the Analysis Type drop-down menu before using the IDB Analyzer to generate SPSS syntax for analysis. For more information on how to use the IDB Analyzer, please refer to the PIAAC Distance Learning Dataset Training module on Considerations for Analysis of PIAAC Data.

For some countries, the populations included in the file provided by Statistics Canada are not aligned with the PIAAC sample population. Therefore, you have to combine or exclude parts of the populations to have a PIAAC-comparable sample and match the estimates for IALS and ALL included in the International Data Explorer (IDE) and rescaled estimates reported elsewhere (in reports published after October 2013). The Canadian population is reported separately as Canada(English) and Canada(French) in both the IALS and ALL files from Statistics Canada, so you need to combine these populations using their country identification codes (CNRTID) if you want to do analysis of the Canadian population as a whole.

Additionally, the IALS dataset reports data from Great Britain as a whole (England/Scotland/Wales), while PIAAC collected data from and reports data on England/Northern Ireland. Therefore, the comparable population between IALS and PIAAC is the population from England. To select only the population from England, and exclude those from Scotland or Wales from analysis with the IALS file from Statistics Canada, you can use the variable GBR. To select a comparable population with the PIAAC data file, you would also have to use the variable for participating country or sub-national entity code (CNTRYID_E) in the PIAAC data from England/Northern Ireland, in order to select only the population from England, and exclude those from Northern Ireland. Additionally, in IALS, a few countries included adults older than 65 in their sample, so analysis with the IALS file from Statistics Canada needs to exclude those over 65 to be comparable to the PIAAC population. The variable AGEINT can be used to exclude that population.

DATA FILES

Q1. How do I access U.S. data files?

A: The U.S. PIAAC Public-Use Files (PUF) are available on the NCES PIAAC website in SPSS, SAS, and ASCII formats. To get a Stata readable file, you can use StatTransfer software. You may need to edit some labels in the syntax to make the syntax run correctly. SAS files are linked to the format files, so please be sure to run the SAS format programs before you import the data files into SAS themselves. For more information, please visit: http://nces.ed.gov/surveys/piaac/datafiles.asp

To access U.S. PIAAC Restricted-Use Files (RUF), you must apply for a restricted-use license from NCES. For more information about accessing the RUF, please refer to Q3 below.

Q2: What are the differences between the U.S. Public Use File (PUF) on the NCES PIAAC website and the U.S. PUF available for download on the OECD PIAAC website? Is there only information added or has some old information been removed?

A: The U.S. PUF data file on the NCES PIAAC website includes additional variables above those included in the U.S. PUF from the OECD PIAAC website. The PUF on the NCES website includes U.S.-only variables on topics such as race/ethnicity, English language ability, health information practices, as well as all variables that follow national routing for analysis. In addition, PUFs of other PIAAC participating countries are only available on the OECD PIAAC website.

Q3: What additional information is available in the U.S. Restricted-Use File (RUF)? How can I gain access to the RUF?

A: The U.S. Restricted-Use File (RUF) contains more detailed information for data that was suppressed in the public-use dataset due to confidentiality concerns for respondent, such as continuous age and earnings variables and more detailed industry and occupation variables. The RUF contains all variables at the level of detail of how the questions were asked and answered in the Background Questionnaire (BQ). If some variables are missing, top-coded, or suppressed in the PUF, they will be present in the RUF in the original form of answer to the question in the BQ. A side-by-side comparison of the variables included in the U.S. PUF and the U.S. RUF can be found in Appendix E of the U.S. PIAAC 2012/2014/2017 Technical Report.

To access the U.S. restricted-use data, you must apply for and obtain a restricted-use license from NCES. More information on the process is available at: http://nces.ed.gov/pubsearch/licenses.asp. Please note that access to the RUF is only available to individuals residing in the U.S. Please allow up to several months for the NCES review of restricted-use license applications.

DATA AVAILABILITY

Q1. Is there any way to get figures for the U.S. competencies grouped by state or county?

A: While the U.S. PIAAC 2012/2014 and 2017 samples do not separately have enough respondents to produce accurate estimates of adults’ skills at the state or county level, NCES was able to combine the samples to develop a model that produces estimates of adults’ skills for all U.S. states and counties. This model was developed using a technique called small area estimation (SAE).

In April 2020, NCES released an interactive, user-friendly mapping tool called the U.S. PIAAC Skills Map: State and County Indicators of Adult Literacy and Numeracy. The Skills Map includes state- and county-level estimates of average literacy and numeracy scores for all U.S. states and counties, including the District of Columbia, as well as the proportions of adults at different PIAAC proficiency levels.

Q2: I am trying to clarify how the Education and Skills Online assessment can be utilized. Would geographic areas (counties) be able to get data on their region? When will it be available? Will there be any cost in establishing a specific group/region for assessment purposes?

A: The Education and Skills Online (E&S Online) assessment can be used at an individual or organizational level, for example by an adult learning center or an employer. The data and results would be available to the individual or the organization (including county-wide or state-wide programs) that sponsored the assessment. A county could sponsor the use of E&S Online and administer the assessment to people in their region or particular populations in their region (e.g., prison population or unemployed population). The sponsoring organization will own the data and can use the results to their own purposes. Please note, a county or any other sponsoring organization would not be able to get data or results from unaffiliated individuals that took Education and Skills Online on their own.

The E&S Online is available at the following link: http://www.oecd.org/skills/ESonline-assessment/

The cost to use E&S Online is around $10-15 per individual, with some discounted rates for groups and organizations purchasing large quantities of the assessment.

Q3: Has information on country of birth been collected in the United States?

A: Several variables exist in the International Data Explorer (IDE) and in the Public-Use Files (PUF) and Restricted-Use Files (RUF) that address this question. In the IDE and PUF, relevant variables include J_Q04A (whether respondent was born in the country: native or non-native), CNT_BRTHUS_C (respondent’s country of birth, collapsed into 2 categories: native or foreign-born), and BIRTRGNUS_C (respondent’s country of birth, collapsed into 3 categories: North American and Western Europe; Latin America and the Caribbean; Other). In addition to the above listed variables, the RUF includes a detailed set of variables J_Q04bUS / J_S04b (respondent’s country of birth: Mexico, China, Philippines, India, Russia, Colombia, Other/Specific answer to the ‘Other’ option).

Q4: I am trying to link observations geographically and by occupation with other data. Is an occupation variable with 4-digit ISCO codes available in the restricted-use dataset? Is the occupation variable with 2-digit ISCO codes available in the public-use dataset or only in the restricted-use dataset?

A: In the U.S. public-use data set, current occupation is available for 1-digit (ISCO1C), 2-digit (ISCO2C), and 3-digit (ISCO08_CUS_C) ISCO codes. The 4-digit ISCO occupation variable (ISCO08_C) is only available in the restricted-use data set. Note that the detail with which the 4-digit ISCO code variable slices the data may rend analysis that has a weak reporting power. For example, dividing the U.S. PIAAC 2012 sample of about 5,000 adults (not all of whom had an occupation or reported it) into more than 400 detailed ISCO occupation codes means that many of the occupation codes will have only one or two cases, which is not a large enough sample size to produce reliable, stable estimates and is a disclosure risk. If you are analyzing occupation in conjunction with proficiency levels, we advise using the 1-digit ISCO occupation variable rather than a more detailed variable. Some of the reports have used a 4-category occupational variable (ISCOSKIL4) that collapses the 1-digit ISCO occupation variable into three derived categories of skilled occupations: semi-skilled white-collar occupations; semi-skilled blue-collar occupations; and elementary occupations.

The geographic variable available for analysis is REGION_US, which includes the following U.S. census regions: Northeast, Midwest, South, and West.

Q5: How can I compare results from PIAAC with results from the Program for International Student Assessment (PISA) in Turkey?

A: Turkey participated in Round 2 of PIAAC, collecting data in 2014, so results from the administration of PIAAC in Turkey are now available in PIAAC international reports from the OECD. Turkey’s data is also available as a Public Use File (PUF) from the OECD as well as in the NCES PIAAC International Data Explorer (IDE) and the OECD IDE. In the absence of evidence from a study linking PISA and PIAAC, though, caution is advised in comparing the results of the two assessments. The overlap between the target populations of PIAAC and PISA is not complete; and while the concepts of literacy in PIAAC and reading literacy in PISA, and the concepts of numeracy in PIAAC and mathematical literacy in PISA are closely related, the measurement scales are not the same. However, the figures on pages 206-207 of OECD Skills Outlook 2013: First Results from the Survey of Adult Skills show that there is a reasonably close correlation between countries’ performance in the different cycles of PISA and the proficiency of the relevant age cohorts in literacy and numeracy in PIAAC.

A more detailed description of the relationship between the two assessments, including differences in the target populations and skills assessed, can be found in Chapter 6 of OECD Skills Outlook 2013: First Results from the Survey of Adult Skills. You could also listen to this presentation by Patrick Bussière, co-leader of Canada's PIAAC team, describing the similarities and differences in design and purpose between PISA and PIAAC. In the videotaped webinar, Patrick Bussière suggests the reasons why it could be of interest to compare the results from the two surveys as we move forward.

Q6: Have the test items for the Computer Literacy Core (CLC) and Computer Numeracy Core (CNC) been released? If so, where can I find them?

A: The test items for CLC and CNC have not been released, because they were used in the additional rounds of PIAAC and will be used for the purpose of trend data in future administrations. The written text descriptions available for a few of the items in the OECD Technical Report of the Survey of Adult Skills are the best information available on the CLC and CNC items. On page 5 of Chapter 21 (page 583 of the PDF), there are written descriptions of two of the CLC items: SGIH and Election Results. On page 10 of Chapter 21 (page 588 of the PDF), there is a written description of one of the CNC items: Bottles. Appendix 1 of the OECD Technical Report (page 622 of the PDF) also includes some characteristics on the difficulty levels of the CLC and CNC items.

DERIVED VARIABLES AND INDICES

Q1: How is the derived employment variable C_D05 defined? It appears to reflect a current employment status, but when matched with other employment status variables, such as C_Q07, a number of people are presenting as employed in one and unemployed in another.

A: Variable C_D05 is a “routing” variable that was derived using five preceding questions and was used to route or branch respondents to subsequent sections for employed, formerly employed or non-employed adults. This variable flags the currently active population, i.e., the labor force measured in relation to a short reference period such as one week. It is the “objective” measure of employment because it was based on an objective definition of employment and is determined by the combination of the participant’s responses to five questions on having/not having or seeking employment. C_D05 was used to determine which subsequent job-related questions the respondent received, so it has an additional importance in terms of availability of data in other sections of the background questionnaire. You can take a look at the derivation of it in the background questionnaire. For information about the motivation, definition, and rationale for this variable in accessible language (rather than the exact syntax and coding of the variable), please refer to the Background Questionnaire Framework.

There is also a “subjective” measure of employment, variable C_Q07, which asks respondents to self-report their own status (employed full time, student, retired, etc.). This variable provides a broader indication of respondents’ current situation.

Therefore, the two variables do not necessarily line up for various reasons such as definition differences, objectivity vs. subjectivity, and misreporting.

Q2: Which items comprise the “Readiness to Learn” derived index in the PIAAC data set?

A: The “Readiness to Learn” index (READYTOLEARN) is derived from the six “About Yourself: Learning strategies” questions:

  • I_Q04b: relate new ideas into real life
  • I_Q04d: like learning new things
  • I_Q04h: attribute something new
  • I_Q04j: get to the bottom of difficult things
  • I_Q04l: figure out how different ideas fit together
  • I_Q04m: look for additional information for clarity

You can view the items here: http://nces.ed.gov/surveys/piaac/final_en_bq.htm#I_Q04b.

Q3: What is the meaning when variables say "Derived by CAPI" or "Trend-IALS/ALL"?

A: When a variable is labeled as “Derived by CAPI”, it means that the variable was derived/coded through the computer-assisted personal interview (CAPI) system that the interviewers used and was a variable created specifically for CAPI in order to route or branch respondents to subsequent questions. Since the variables were coded by the system in the process of the interview, some inconsistency between Background Questionnaire variables and CAPI derived variables may exist. When a variable is labeled as “Trend-IALS/ALL,” it means that the variable can be linked back to an identical or similar variable that was used in one or both of the previous international adult literacy assessments, IALS and ALL. These variables must be used when conducting an analysis across the assessments.

Q4: How does PIAAC convert educational attainment to years of school?

A: PIAAC used the International Standard Classification of Education (ISCED) to map educational attainment to years of schooling. The OECD Technical Report reports the ISCED classifications for each level of educational attainment and how each level of educational attainment was converted to total years of schooling for the U.S. and all other countries. The ISCED mapping and conversion of educational attainment to years of schooling for the U.S. is found in Appendix 5. The mapping used for other countries is also found in the same appendix of the technical report. In the PIAAC dataset, information on years of schooling is available in the derived variable for highest level of education imputed into years of education (YRSQUAL).

Do you have a question? Ask us using the form below!