read_bim
, read_fam
, read_ind
, and read_snp
functions.write_bed
written in Rcpp and thoroughly tested against BEDMatrix
package.write_bed
error message for invalid data, documentation.write_bed
tests.write_fam
, write_bim
, write_ind
, write_snp
functions.read_*
code, updated docs and tests.make_fam
, make_bim
, and write_plink
functions.read_fam
bug (used to require phenotypes to be integers, now can be double numbers).verbose
option to write_bed
.write_plink
now returns NULL
invisibly.require_files_plink
, delete_files_plink
.ind_to_fam
, sex_to_int
, sex_to_char
.read_bed
and read_plink
! Now all Plink reading and writing operations are supported.BEDMatrix
, snpStats
, and lfa
.read_plink
now includes row and column names automatically.read_bed
accepts either row and column names or just their numbers.write_plink
checks these row and column names against the BIM and FAM tables for consistency, if these are all present.BEDMatrix
in testing, since it leaves temporary files open and on Windows they do not get deleted and leave confusing error messages behind.include <cerrno>
to my cpp code.read_phen
and write_phen
, a phenotype format (very similar to Plink’s FAM) used by GCTA and EMMAX.write_plink
returns the data it wrote, invisibly as a list. Most useful for auto-generated data.man/figures/
tidy_kinship
to transform a square symmetric matrix into a long-format table that is easy to sort and add annotations toread_grm
and write_grm
to read and write GCTA’s binary genetic relatedness matrix (GRM) format.require_files_grm
, delete_files_grm
, require_files_phen
, and delete_files_phen
.validate_tab_generic
.write_plink
, write_bed
, and write_bim
now have append
option, for writing extremely large files in parts.write_eigenvec
and read_eigenvec
to read and write Plink/GCTA eigenvector files.count_lines
, uses C++ code (via Rcpp) to count file lines extremely quickly. Intended for counting numbers of individuals (from FAM and equivalent files) or numbers of loci (from BIM and equivalent files) when these files are extremely large and no other information is needed from those files.read_eigenvec
added Plink 2 support via comment
option, which by default now treats data after #
as comments. This enables automatically parsing eigenvec files generated by Plink 2, whose header line starts with #
(this header is ignored). Previously, parsing Plink 2 eigenvec files generated warnings and resulted in the first row being an additional row with all NA
values.read_bed
added a missing file check in R code.
lfa
comparison.
lfa
fork doesn’t have function read.bed
anymore, previously the slowest and most memory-hungry competitor, which genio::read_plink
was being compared to.genio
package docgeno_to_char
to convert genotype numeric codes (allele dosages such as 0, 1, 2) into character codes such as ‘A/A’, ‘A/G’, ‘G/G’ (depending on locus).read_matrix
and write_matrix
, intended for admixture inference data.read_bed
, which previously incorrectly stated that the numerical genotypes (allele dosages) counted alternative alleles (allele 2 in BIM table), whereas the truth is that they count reference alleles (allele 1).count_lines
now returns value as integer instead of double (a very minor bug/annoyance fix).lfa
from suggested packages (no connection anymore since lfa
comparison was removed from vignette in version 1.0.19.9000).read_bed
now reads file
even if it doesn’t have a BED extension (as long as it exists).
ext
option.read_*
functions to clarify behavior regarding file
and ext
options.real_path
to add_ext_read
to make the distinction clearer to add_ext
.read_*
functions use add_ext_read
while all write_*
functions use add_ext
. Only function count_lines
switched from add_ext
to add_ext_read
(in addition to read_bed
, which led to the earlier change), but count_lines
didn’t have a default extension so this change is less likely to matter.NEWS.md
slightly to improve its automatic parsing.read_bed
and read_plink
no longer stop with an error if the input BED file has non-zero padding bits.
plink2
binary and the BEDMatrix
R package load this file without complaining about the non-zero pads, so I decided to agree in that behavior. I verified that genio
’s data agrees with BEDMatrix
after the fix.write_bed/plink
with append = TRUE
debugged to write in “binary” mode.
append
option was introduced in 1.0.15.9000 (2020-07-03).readr::read_table2
with readr::read_table
read_table2()
was deprecated in readr 2.0.0. Please use read_table()
instead.
readr
(>= 2.0.0, already on CRAN).pryr::object_size
with lobstr::obj_size
(a suggested package used in vignette only; the former was recently superseded by the latter)
pryr::object_size
output (now of class lobstr_bytes
), which triggered a CRAN warning.read_eigenvec
fixed this warning:
value
argument of names<-
must be a character vector as of tibble 3.0.0.”write_bed
, write_plink
, and count_lines
fixed a bug: write (or read) failed if output path started with “~/” on Unix systems. Problem was the path wasn’t expanded in C++ code.
For example, write_plink( '~/test', X )
failed with message:
Writing: ~/test.bed
Error in write_bed_cpp(file, X, append = append) :
Could not open BED file `~/test.bed` for writing: No such file or directory
Calls: write_plink -> write_bed -> write_bed_cpp
Execution halted
Thanks to Bingsong Zhang for reporting the bug!
read_eigenvec
and write_eigenvec
have new option plink2
for better handling files with headers in the default style of plink2.count_lines
and all read_*
functions, which use add_ext_read
internally to sort out file paths:
ext = NA
finds files that end in a .gz
extension that was not specified (before those files were incorrectly not found).read_matrix( 'my-file', ext = NA )
now finds and reads my-file.gz
if it exists and my-file
does not exist.README
fixed github installation instructions to build vignette, explained how to view it.read_grm
added several options to facilitate reading GRM-like formats produced by plink2
, particularly data produced by --make-king
with bin
or bin4
options. Added options:
ext
to specify alternate shared extensions (like “grm” or “king”).shape
to specify whether the input is a full “square” matrix, a “triangle” with diagonal (default for GRM) or a “strict” triangle without diagonal (for KING-robust).size_bytes
to parse bin4
/GRM (4) or bin
(8) plink2 data.comment
to control comment characters in the <ext>.id
file.vec_to_mat_sym
and mat_sym_to_vec
added option strict
to exclude diagonal in their transformations.read_tab_generic
added option comment
to set comment characters.write_grm
added the same options added yesterday to read_grm
(see there) to write GRM-like formats produced by plink2
, particularly data produced by --make-king
with bin
or bin4
options.read_grm
edited documentation only, particularly added parsing examples for various plink2 --make-king
outputs.write_bed
now checks if output directory exists prior to attempting to open the file for writing in the C++ part of the code.
The original code crashed “ruthlessly” in RStudio if the path contains a directory that does not exist, triggering an error such as this one on a terminal:
*** buffer overflow detected ***: terminated
Aborted (core dumped)
The new code produces an ordinary (fatal) error message in R without the buffer overflow.
Bug reported by Richel Bilderbeek (thanks!)
read_bim
, write_bim
, and geno_to_char
: reversed columns “ref” and “alt” in BIM table
read_bim
now returns a tibble with allele names “alt” and “ref” in that order (columns still ordered as they appear in input file)write_bim
writes tables with column “alt” before “ref”geno_to_char
reverses the role of “alt” and “ref” correspondingly so that the output remains the same as before these changes (the original outputs were correct as validated against the plink1 “ped” text genotypes).cran-comments.md