NEWS | R Documentation |
Added support for the National Household Education Survey (NHES) set of data. See downloadNHES
for instructions to locate and obtain the NHES data files, and see readNHES
for reading the NHES data for analysis.
Added support for the School Survey on Crime and Safety (SSOCS) set of data. See downloadSSOCS
for instructions to locate and obtain the SSOCS data files, and see readSSOCS
for reading the SSOCS data for analysis.
append.edsurvey.data.frame.list
now uses labA
and labB
in preference to existing labels. It also now accepts label names other than label
. When the resulting labels are not unique, it generates new ones. Also, if a single edsurvey.data.frame
is added to an edsurvey.data.frame.list
, and the labels for the edsurvey.data.frame.list
appear to be automatically generated, then the labels are automatically generated for the output of the call to append.edsurvey.data.frame.list
as well. These changes should make the output of append.edsurvey.data.frame.list
seem more logical.
showCutPoints
now returns one line per set of cut points and does not show the cut point labels.
updated warnings and errors when an edsurvey.data.frame
is passed to a vareity of functions that requires use of PSU or stratum variables but they are not on the data.
cor.sdf
now allows dynamic variables for the variables being correlated (x
and y
)
searchSDF
now generates a tabled output when passed an edsurvey.data.frame.list
.
SD
used to use the population variance estimator and now uses the sample variance estimator. This also affects standard deviations calculated with the gap
function.
SD
used to set jkSumMultiplier
to 1 by default and now uses the value from the edusrvey.data.frame
. While this was the documented behavior, it was counterintuitive since all other EdSurvey
functions use the value directly from the edusrvey.data.frame
.
when calling achievementLevels
with both returnDiscrete = FALSE
and returnCumulative = TRUE
it would return an error. It now properly returns.
Added support for the PISA Young Adult Follow-up Study (YAFS). See readPISA_YAFS
for more information.
readICCS
now woks for 2016 ICCS data.
supports linking error in NAEP assessments that have combined PBA and DBA formats, including NAEP 2018 Social Studies (grades 8); 2019 Science (grades 4, 8, and 12); 2019 Mathematics and Reading (grades 4, 8, and 12). See the vignette for details on the implementation at https://www.air.org/sites/default/files/2021-07/EdSurvey-linkingerror.pdf.
mml.sdf
a new function that estimates student scale scores using the marginal maximum likelihood regression estimation method (as the program AM does). This is an alternative method to the plausible values approach.
mixed.sdf
now gives clearer information about groups and weights, similar to the updated WeMix::mix
function.
achievementLevels
now accepts multiple sets of plausible values at once, allowing joint achievement levels to be calculated. Because of this the discrete
and cumulative
results column names now include the name of the variable. So for a composite score it will now be labeled composite_Level
where it used to say Level
. Also the cutpoints
object now specifies the scores the cutpoints regard.
percentile
now has a new method “symmetric” that is similar to “simple” but returns the negative result if all of the values are multiplied by negative one. This method is faster than “unbiased” which has a small bias correction.
percentile
used to report the number of people who were more extreme than any percentile in the nsmall
column. This was not used and so was removed.
percentile
now accepts a dofMethod
argument to allow either of the methods described in DoFCorrection
: WS
for the Welch-Satterthwaite formula or JR
for the Johnson-Rust formula. This defaults to JR
but was previously WS
.
edsurveyTable
now warns and removes rows with non-positive full sample weights or no outcome. The also warnings now include the number of rows removed by the filter. Non-positive and NA weights are removed only if the returnMeans
or returnSepct
are TRUE
. Missing outcome scores are removed only if returnMeans
is TRUE
.
we simplified all the code in the statistical methods, including rq.sdf
, lm.sdf
, and percentile
.
after subsetting an edsurvey.data.frame, the cache (user defined variables) used to throw errors. This now works.
mixed.sdf
used to not correctly identify covariances when outputting the variance tables. Thanks to Endraswara Wiryawan for identifying this bug.
readTIMSS
previously attempted to merge on 2019 teachers using the student's grade level. This lead to many teacher variables not being merged on in AAD (Abu Dhabi, UAE), ARE (United Arab Emirates), and NZL (New Zealand). Thanks to Damian Campbell for identifying this bug.
fixed readTIMSSAdv
bug for 1995 data relating to storing column dimension information in cache.
when EdSurvey
is attached, it nolonger sets the options value of timeout
. Instead it only sets it to the larger of the current value and an hour when downloading files.
downloadTIMSS
and readTIMSS
were updated to include 2019.
downloadPIAAC
was updated so data/codebook downloads perform more robustly on different OS.
readPIAAC
was updated so the number of replicate weights in the resulting edsurvey.data.frame
is accurate to the OECD source data. Before this number was hard coded, resulting in duplicate replicate weights.
readPISA
added 2018 financial literacy.
downloadICILS
now references instructions for downloading the 2018 ICILS data set, as the readICILS
function supported ICILS 2018 as of EdSurvey v2.4.0
when calling $
on an edsurvey.data.frame with a weight variable, only the vector of full sample weights are returned now. This used to also return a matrix (when relevant) that also included the replicate weights.
downloadPISA
and readPISA
now support the PISA 2018 Financial Literacy (FIN) database.
lm.sdf
, glm.sdf
, and edsurveyTable
and now supports surveys where the PSU and stratum variables varried by weight variable (usually longitudinal surveys) for Taylor series variance estimation.
mixed.sdf
now has a coef
and vcov
methods.
rq.sdf
now returns the rho statistic
waldTest
now accommodates JK1 denominator degrees of freedom explicitly and allows for cases where the F-test is not valid to return, but only returns the chi-squared test.
waldTest
now covers rq.sdf
and mixed.sdf
models.
the vignette was expanded to include more links.
The ==
(equal) and !=
(not equal) comparison operators now work on edsurvey.data.frame
and/or edsurvey.data.frame.list
objects to compare objects by their attributes (e.g., survey, year, country).
readNAEP
now allows fr2 paths that are seperate from the usual location and is more robust to variation in capitalization, but requires that a file name be unique, regardless of capitalization.
getPSUvar
and getStratumVar
are now more robust and return more informative warnings.
Added match.arg
parameter matching for the following functions/arguments for usability: downloadPISA (database)
, readCivEDICCS (dataSet, gradeLvl)
, readICILS (dataSet)
, readPISA (database, cognitive)
, readTALIS (isced, dataLevel)
, readTIMSS (gradeLvl)
, readTIMSSAdv (subject)
.
readPIAAC
was fixed so it can now find cached files for usa17
and usa12_14
. Additionally, match.arg
checks were added to better check user inputted data to the countries
and usaOption
arguments.
The readNAEP
function has updates for the frPath
argument to correctly handle relative paths, path expansion (~), and additional file validation checks.
The gap
function now correctly parses expressions that use a variable in the global environment.
The summary2
function now gives a clearer error when something that is not the name of a weight column is passed to the weightVar
argument.
ECLS_B is now a supported dataset with the readECLS_B
function.
downloadECLS_K
and readECLS_K2011
now download and read in, respectively, the latest ECLS-K:2011 K-5 data.
readTALIS
was updated to work with TALIS 2018.
downloadPISA
added support for 2018 and now downloads 2018 data. OECD compressed the 2018 PISA with an algorithm not implemented in R (DEFLATE64), and so users are instructed to uncompress it outside of R and EdSurvey before using the data.
readPIAAC
was updated to work with files OECD released in early 2020. This includes adding a usaOption
argument that allows the user to select between the 2012-2014 file and the 2017 file for U.S. data.
readICILS
was updated to work with files IEA released early in 2020.
EdSurvey now supports NAEP assessments with no plausible value variables, such as some National Indian Education Study (NIES) data.
EdSurvey now supports TIMSS 2019 pre-release files, including bridge files.
For readHSLS
and readELS
we bug an issue parsing .sav
variable format definitions relating to integer definitions with a display of more than 10 digits. We believe this does not change how EdSurvey reads in any public use files. The issue can arise if the user converts restricted use files with Stat/Transfer.
Some IEA datasets have convenience variables that EdSurvey incorrectly identified as plausible value variables that are discretized plausible values. EdSurvey already internally generates similar convenience variables based on the existing plausible values and does not use the IEA variables. As of 2.6.0, these columns are not linked as plausible variables and are ignored. Affected datasets include TIMSS, TIMSS Advanced, PIRLS, and ePIRLS.
cor.sdf
now reports standard errors linearized back to the correlation space in the result value se
. It also reports the confidence interval in the Fisher-Z space as ZconfidenceInterval
and a confidence interval mapped back to the correlation space (without linearization) as confidenceInterval
.
cor.sdf
now allows users to select the Fisher-Z transformation or no transformation with the transformation
argument. The default of using the Fisher Z for Pearson and no transformation for other correlation types remains the same.
cor.sdf
now allows the user to set a value of jrrIMax
to allow faster computation.
Updated oddsRatio
to use t-statistic instead of Z, and to identify the confidence interval in the linear space and then convert.
The +
function now works on edsurvey.data.frame
and/or edsurvey.data.frame.list
objects to build an edsurvey.data.frame.list
. See ?edsurvey.data.frame.list
for examples.
The %in%
function now works on edsurvey.data.frame.list
objects, identifying if a survey is in an edsurvey.data.frame.list
. See ?edsurvey.data.frame.list
for examples.
percentile
, achievementLevels
, and edsurveyTable
now warns when an edsurvey.data.frame.list
is passed to it with an edsurvey.data.frame
on it that cannot return valid results for the requested statistics. The output is NA
for these surveys.
gap
now supports linking error in PISA for math, reading, and science when a user sets includeLinkingError
to TRUE
.
contourPlot
deprecated the argument colors
which only set density colors. Instead, the function now uses densityColors
and pointColors
, which set the density and point colors separately. This will cause calls to contourPlot
that use colors
to not function with 2.6.0.
When a summary2
call is made with a light.edsurvey.data.frame
, unfilled levels will not be returned in the table anymore.
Users can now assign a variable with $, including subsets. Variables on the original data cannot be removed but can be overwritten. All changes are for the session only and do not change the data on the disk.
getData
now closes LaF connections regardless of how it exits (e.g., with an error). This should improve the stability of the package.
An edsurvey.data.frame
now can specify a fr2path
. This is used only for NAEP surveys and no functionality depends on it, but it may be used in the future. The value of fr2path
may always be set to NA
.
achievementLevels
now warns when an edsurvey.data.frame.list
is passed to it with an edsurvey.data.frame
on it that cannot return valid results for the requested statistics. The output is NA
for these surveys.
cor.sdf
and percentile
incorrectly calculated standard errors when the pre-multiplier for the replicate weights was not one. Users can see that for a survey using getAttributes(data, "jkSumMultiplier")
for their data.
cor.sdf
previously failed to calculate some correlations when some records had non-zero full sample weights and zero replicate weights. Now results are still calculated.
With TIMSS data, when a user updated a teacher-level variable, it used to prevent the user from using the edsurvey.data.frame
for non-teacher-level analysis. We fixed this issue. Thanks to Anders Astrup Christensen for identifying this issue and sharing reproducible code that is now incorporated into the package as a test.
Added support for the 2007-2008 Beginning Teacher Longitudinal Study (see readBTLS
) for Waves 1 to 5.
readPISA
uses less memory when reading in PISA 2015 data. This is because it now uses the haven
package's read_sav
new skip
and n_max
arguments to read in portions of files. This should save time on read in.
Other PISA files were changed to use base R read functions, this results in a different ordering of columns in 2012 and 2015.
Some PISA 2003 variables were read in as an integer and are now in as a real values, mostly hours variables.
After the OECD files were updated, downloadPIAAC
and readPIAAC
now properly use the 2012 USA file for Cycle 1.
To allow both 4.0 and 3.6 compatibility, an edsurvey.data.frame.list
now stores covs
as characters in preference to factors.
EdSurvey
is no longer imports readr
, rvest
, stringi
, stringr
, or xml2
. In addition, RColorBrewer
was imported but is now suggested.
ContourPlot
now uses heat.colors
by default.
Several data files were stored in text format in functions and are now stored in inst/extdata
as rds
files.
The news for 2.4.0 was edited to note that the gap
linking error in NAEP is only for mathematics and reading.
lm.sdf
now rounds the printed coefficients matrix in the standard way.
Fixed a bug in edsurveyTable
that caused it to return incorrectly labeled output if multiple covariates were tabbed with weights.
A broken link in the vignette was fixed.
Added support for the 2009 High School Longitudinal Study (see downloadHSLS
and readHSLS
). These do not support Restricted Use Data (RUD).
Added support for the 2002 Education Longitudinal Study (see downloadELS
and readELS
).
Added the ECLS Kindergarten Class of 1998-1999 Study. Thes datasets can be downloaded with downloadECLS_K
and read in with readECLS_K1998
. This was added in 2.3.0 but first added to the NEWS for 2.4.0.
Added support for ePIRLS (see read_ePIRLS
and download_ePIRLS
). This was added in 2.3.0 but first added to the NEWS for 2.4.0.
Added support for 2018 for the existing readICILS
function. Thanks to Jeppe Bundsgaard of Aarhus University, Danish School of Education, for contributing the code for this.
OECD moved the PIAAC data to a new location, and the downloadPIAAC
function now uses the new URL.
The PISA 2015 data cache could have been formed incorrectly; that is fixed. When PISA 2015 datasets are first used with 2.4 it will take time to re-cache the data. The process now also uses far less memory.
The PISA data had incorrect PSU and stratum variables for most years. They are all fixed except for 2000, which we do not believe has a PSU variable on the file. Several strata have only one PSU, so the design still needs to be edited by the user to get correct Taylor series sampling variance estimates.
The mixed.sdf
function now correctly aggregates results and has its methodology documented. It no longer supports binomial models and has several arguments deprecated as a result.
The gap
function argument varMethod
is deprecated. The function uses only jackknife variance estimation.
The gap
function now accounts for linking error between NAEP paper and digitally based assessments for mathematics and reading.
The subset
function used to fail when a global variable shared a name with a column on the data; it now works.
The percentile
function has been updated to a formula that generates survey percentiles that are robust to transformation. For example, if the values are multiplied by a constant, then the percentiles move by that same constant.
The lm.sdf
function header now prints the number of plausible values used (when they are used) as well as the number of plausible values used in the sampling variance (jrrIMax
).
The levelsSDF
function now makes more informative warnings when passed a light.edsurvey.data.frame
.
The print
function for cor.sdf
now prints a final new line.
When searchSDF
was called with more than one search string and the levels
argument was set to TRUE
, a useless warning was issued. The warning was removed.
More EdSurvey
functions wrap to the width of the console.
The rq.sdf
function adds quantile regression to the package. See the rq.sdf
documentation for more information.
The functions getStratumVar
and getPSUVar
were added to give users stratum and PSU variables for surveys and Taylor series analyses.
The summary2
function now accepts a vector of variables.
The searchSDF
function now accepts a vector string
input so that results can be filtered.
The formula for degrees of freedom when Taylor series variance estimation is used has been updated. The new formula is derived in the statistics vignette. See https://www.air.org/sites/default/files/EdSurvey-Statistics.pdf.
PISA 2015 is now supported.
The waldTest
function allows the user to test composite hypotheses–hypotheses with multiple coefficients involved–even when the data include plausible values. Because there is no likelihood test for plausible values nor residuals, the Wald test fills the role of the likelihood ratio test, ANOVA, and F-test.
The mvrlm
function adds multivariate regression (a regression with multiple outcomes) to the package. See the mvrlm
documentation for more information.
Survey weighted mixed models can now be fit with the mixed.sdf
function. Both linear and logistic models can be fit. These models are limited to 2 levels (one level with random effects).
Regressions can now output standardized regression coefficients using summary(myLm, src=TRUE)
. When the lm.sdf
call includes standardizeWithSamplingVar=TRUE
the standardized error of the standardized regression coefficient account for the sampling error and measurement error (when applicable). Otherwise, the standard deviations are assumed to be measured without error.
Added summary2
function to produce unweighted and weighted descriptive statistics of a variable in edsurvey.data.frame
or light.edsurvey.data.frame
Added $ variable access to edsurvey.data.frame. e.g. sdf$dsex
Updated covariance matrix estimation in the vcov
function for lm.sdf
and glm.sdf
to work when varEstInputs
was not returned
Added covariance matrix estimation for lm.sdf
when Taylor series variance estimation was used.
Added rebindAttributes
function to make dplyr interaction more smooth. See the rebindAttributes
documentation for an example.
When printing an edsurvey.data.frame
it now says the survey, year, subject, and country at the top. The dimensions are moved down.
In gap
, achievement levels can be specified with partial matches. This helps out when achievement levels have long names.
Added the ECLS Kindergarten Class of 2010-2011 Study. Longitudinal datasets can be downloaded with downloadECLS_K
and read in with readECLS_K2011
.
PIRLS 2016 is now supported.
Added returnNumberOfPSU
in achievenemtLevels
, percentile
, lm.sdf
, and gap
to report the number of primary sampling units (PSUs) used to calculate a statistic.
Added oddsRatio
helper function for logit results to show odds ratios.
When running a Pearson correlation on a discrete variable, cor.sdf
function by default condenses the occupied response codes to be consecutive integers. This can now be turned off by setting the condenseLevels
argument to FALSE
so that the code book levels are used instead.
The glm.sdf
function now uses the glm2
package to fit models. This package converges on a broader class of models.
The EdSurvey
package nolonger sets the number of threads used by the required data.table
package to one when EdSurvey is being attached. The issue is now fixed in data.table
.
Across the download functions, when a file is missing the warning text was homogenized. Additionally, all download functions now support a verbose
argument that can be used to make downloads silent.
The lm.sdf
and glm.sdf
functions now accept formulas that use the I()
function or other unevaluated expressions to the left of the tilda. Previously only a single variable could be named.
recode.sdf
now checks that each recode has only a to
and from
in it.
The edsurveyTable
function now works without RHS variable, allowing the formula y ~ 1
to return overall means.
The percentile
function used to produce output even if asked to produce a percentile outside of the valid range (0 to 100). Now it prints out message if users input at least percentile outside of the valid range, and stops if all percentiles are invalid.
The getData
function now removes rows with omitted levels after being recoded.
All SPSS (.sav) file reads using the haven
package set the user_na = TRUE
flag to ensure no defined missing/omitted values are automatically converted to NA
values prematurely.
For consistency with other download functions, downloadPISA
now uses a years
argument instead of a year
argument.
Running an edsurveyTable
on an edsurvey.data.frame.list
used to have the potential to create invalid (unprintable) output if the factor levels did not agree on every element of the edsurvey.data.frame.list
. They now return printable output.
An edsurveyTable
could produce a standard error when there was data from only one stratum. It now produces an NA
standard error.
Works with the Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, the Progress in International Reading Literacy Study (PIRLS), and the International Computer and Information Literacy Study (ICILS), International Civic and Citizenship Education Study (ICCS), the Civic Education Study (CivEd), the Program for International Student Assessment (PISA), the Program for the International Assessment of Adult Competencies (PIAAC), and the Teaching and Learning International Survey (TALIS).
International datasets can be downloaded with downloadTIMSS
, downloadTIMSSAdv
,
downloadPIRLS
, downloadICILS
, downloadICCS
,
downloadCivEDICCS
, downloadPISA
, downloadPIAAC
, downloadTALIS
.
International datasets can be loaded with readTIMSS
, readTIMSSAdv
,
readPIRLS
, readICILS
, readICCS
,
readCivEDICCS
, readPISA
, readPIAAC
, readTALIS
.
Added logit.sdf
and probit.sdf
functions with support
for survey item responses.
Added gap
codegap that compares the average, percentile, achievement level, or percentage of survey responses between two groups that potentially share members.
Added percentile
that calculates the percentiles of a numeric variable.
Added showCodebook
that retrieves variable names, variable labels, and value labels for an edsurvey.data.frame
, light.edsurvey.data.frame
, or edsurvey.data.frame.list
.
Redesigned achievementLevels
, edsurveyTable
function for
significant faster computation with much smaller memory footprint. We also
made error messages and outputs more informative.
More informative error message and output for cor.sdf
,
levelsSDF
, getPlausibleValue
, print.edsurvey.data.frame
,
searchSDF
, showPlausibleValues
, showWeights
, and getData
.
lm.sdf
and glm.sdf
are now S3 method extended from
stats::lm
so users can call the function using lm
and glm
.
Added the contourPlot
for regressions diagnostic.
Added the recode.sdf
function for recoding levels within variables.
Added the rename.sdf
function for modifying variable names.
Added the append.edsurvey.data.frame.list
function to return
a list of sdfs from either an edsurvey.data.frame.list or a single edsurvey.data.frame.
Manual documentation was refreshed.
Moved vignettes to AIR webiste at https://www.air.org/project/nces-data-r-project-edsurvey or see links in the vignette inlcuded in this package.
Added a new vignette on “Exploratory Data Analysis on NCES Data” provides examples of conducting exploratory data analysis on NAEP data.
Added new vignette on “Calculating Adjusted p-Values From EdSurvey Results” to the AIR website describing the basics of adjusting p-Values to account for multiple comparisons.
Added new vignette on “Using EdSurvey to Analyze TIMSS Data” to the AIR website describing; an introduction to the methods used in analysis of large-scale educational assessment programs such as Trends in International Mathematics and Science Study (TIMSS) using the EdSurvey package. The vignette covers topics such as preparing the R environment for processing, creating summary tables, running linear regression models, and correlating variables.
Added new vignette on “Using EdSurvey for Trend Analysis” to the AIR website describing the methods used in the EdSurvey package to conduct analyses of statistics that change over time in large-scale educational studies.
Added new vignette on “Producing LaTeX Tables From edsurveyTable Results With edsurveyTable2pdf” to the AIR website detailing the creation of pdf summary tables from summary results using the edsurveyTable2pdf function.
Added new methodology documentation on “Methods Used for Gap Analysis in EdSurvey” to the AIR website convering the methods comparing the gap analysis results of the EdSurvey package to the NAEP Data Explorer.
Added new methodology documentation on “Methods Used for Estimating Percentiles in EdSurvey” to the AIR website describing the methods used to estimate percentiles.
Added new methodology documentation on “Weighted and Unweighted Correlation Methods for Large-Scale Educational Assessment: wCorr Formulas” to the AIR website detailing the methodology used by the wCorr R package for computing the Pearson, Spearman, polyserial, and polychoric correlations, with and without weights applied. See https://www.air.org/resource/report/weighted-and-unweighted-correlation-methods-large-scale-educational-assessment.
Fixed connection issue associated with closing connections to an LaF
.
readNAEP function now works on a case sensitive file system.
Vignettes now should appear in numerical order on CRAN.
Examples now name most arguments.
Vignettes now name most argument.
some print functions had number of plausible values added.