Detect the Language of Text
Franc has no external dependencies and supports 310 languages; all languages spoken by more than one million speakers. Franc is a port of the JavaScript project of the same name, see https://github.com/wooorm/franc.
Simply supply the text, and franc detects its language:
#> [1] "afr"
#> [1] "ben"
#> [1] "nno"
#> language score
#> 1 por 1.0000000
#> 2 src 0.8800937
#> 3 glg 0.8702576
#> 4 snn 0.8637002
#> 5 bos 0.8168618
#> 6 hrv 0.8103044
und
is the undefined
language, this is returned if the input is too short (shorter than 10 characters by default).
#> [1] "und"
#> [1] "sco"
You can provide a whitelist or a blacklist:
#> language score
#> 1 por 1.0000000
#> 2 src 0.8800937
#> 3 glg 0.8702576
#> 4 spa 0.7995316
#> language score
#> 1 por 1.0000000
#> 2 snn 0.8637002
#> 3 bos 0.8168618
#> 4 hrv 0.8103044
#> 5 cat 0.8065574
#> 6 spa 0.7995316
The R version of franc supports 310 languages. By default only the languages with more than 1 million speakers are used, this is 175 languages. The min_speakers
argument can relax this, and allows using more languages:
#> language score
#> 1 por 1.0000000
#> 2 src 0.8800937
#> 3 glg 0.8702576
#> 4 snn 0.8637002
#> 5 bos 0.8168618
#> 6 hrv 0.8103044
#> language score
#> 1 lad 1.0000000
#> 2 por 0.9442724
#> 3 pov 0.8788147
#> 4 ast 0.8677576
#> 5 roh 0.8363556
#> 6 src 0.8310482
MIT © Mango Solutions, Titus Wormer, Maciej Ceglowski, Jacob R. Rideout, Kent S. Johnson, Gábor Csárdi