R bindings for uchardet
library, that is the encoding detector library of Mozilla. It takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text and returns encoding names in the iconv-compatible format.
Key features:
To install the package from the CRAN run the following command:
Also you could install the dev-version with the install_gitlab()
function from the remotes
package:
This package contains the compiled code, therefore you have to use the Rtools to install it on Windows.
Installation from source requires uchardet
library and headers. On Linux or OSX the configure script try to find it with pkg-config
or system include/library paths. You can define include and library paths with UCHARDET_INCLUDES
and UCHARDET_LIBS
configure variables.
If the uchardet
system library is not found it will be compiled from source. You can force the compilation of the builtin library with the --with-builtin-uchardet
configure argument.
# load packages
library(uchardet)
# detect string encoding
ascii <- "Hello, useR!"
print(ascii)
#> [1] "Hello, useR!"
detect_str_enc(ascii)
#> [1] "ASCII"
utf8 <- "\u4e0b\u5348\u597d"
print(utf8)
#> [1] "下午好"
detect_str_enc(utf8)
#> [1] "UTF-8"
# detect raw vector encoding
detect_raw_enc(charToRaw(ascii))
#> [1] "ASCII"
detect_raw_enc(charToRaw(utf8))
#> [1] "UTF-8"
# detect file encoding
ascii_file <- tempfile()
writeLines(ascii, ascii_file)
detect_file_enc(ascii_file)
#> /tmp/RtmpLPr6Ds/file5cf9a735cf8d0
#> "ASCII"
utf8_file <- tempfile()
writeLines(utf8, utf8_file)
detect_file_enc(utf8_file)
#> /tmp/RtmpLPr6Ds/file5cf9a7305c0f7
#> "UTF-8"
Use the following command to go to the page for bug report submissions:
Before reporting a bug or submitting an issue, please do the following:
news(package = "uchardet", Version == packageVersion("uchardet"))
command;uchardet
package, not from other packages;Please attach traceback() and sessionInfo() output to bug report. It may save a lot of time.
The uchardet
package is distributed under GPLv2 license.