textmineR 3.0.5

This version is a patch. In this version I have

Fixed a bug in CalcHellignerDist() and CalcJSDivergence() that sometimes caused inputs to be overwritten.
Fixed some typos in the vignette for topic modeling
Updated the documentation on FitCtmModel() to better explain how to pass control arguments to CTM’s underlying function.
Enabled return of a tibble or data.frame (instead of only data.frame) in the following functions: SummarizeTopics, GetTopTerms, TermDocFreq (Thanks to Mattias for the PR)

textmineR 3.0.4

This version is a patch. In this version I have

Removed unconditional stripping in MAKEVARs as specified by CRAN
Improved outputs of FitLdaModel

textmineR 3.0.3

This version is a patch. In this version I have

fixed an error related to the update.lda_topic_model method.
added a method posterior.lda_topic_model to sample from the posterior of an LDA topic model.

textmineR 3.0.2

This version is a patch. In this version I have

changed some elements of NAMESPACE to pass additional CRAN checks.
added an update method for the lda_topic_model class. This allows users to add documents to an existing model (and even add new topics) without changing the indices of previously-trained topics. e.g. topic 5 is still topic 5.
added a vignette for using tidytext alongside textmineR

textmineR 3.0.1

This version is a patch in response to issues revealed by automatic checks upon submission to CRAN plus an additional issue I encountered along the way.

I have * Used the CRAN template for my MIT LICENSE file * Modified the example of the LabelTopics function to speed up run time for that example * Modified vignettes to run in less time * Added a Makevars file to keep compiled code small on Ubuntu.

Please read below for major updates between v2.x.x and v3.x.x

textmineR 3.0.0

This version significantly changes textmineR.

Several functions that were slated for deletion in version 2.1.3 are now gone.
- RecursiveRbind
- Vec2Dtm
- JSD
- HellDist
- GetPhiPrime
- FormatRawLdaOutput
- Files2Vec
- DepluralizeDtm
- CorrectS
- CalcPhiPrime
FitLdaModel has changed significantly.
- Now only Gibbs sampling is a supported training method. The Gibbs sampler is no longer wrapping lda::lda_collapsed_gibbs_sampler. It is now native to textmineR. It’s a little slower, but has additional features.
- Asymmetric priors are supported for both alpha and beta.
- There is an option, optimize_alpha, which updates alpha every 10 iterations based on the value of theta at the current iteration.
- The log likelihood of the data given estimates of phi and theta is optionally calculated every 10 iterations.
- Probabilistic coherence is optionally calculated at the time of model fit.
- R-squared is optionally calculated at the time of model fit.
Supported topic models (LDA, LSA, CTM) are now object-oriented, creating their own S3 classes. These classes have their own predict methods, meaning you do not have to do your own math to make predictions for new documents.
A new function SummarizeTopics has been added.
tm is no longer a dependency for stopwords. We now use the stopwords package. The extended result of this is that there is no longer any Java dependency.
Several packages have been moved from “Imports” to “Suggests”. The result is a faster install and lower likelihood of install failure based on packages with system dependencies. (Looking at you, topicmodels!)
Finally, I have changed the textmineR license to the MIT license. Note, however, that some dependencies may have more restrictive licenses. So if you’re looking to use textmineR in a commercial project, you may want to dig deeper into what is/isn’t permissable.

textmineR 2.1.3

Deprecating functions that will be removed, renamed, or have significant changes to syntax or functionality in the forthcoming textmineR v3.0.
Functions slated for deletion:
- RecursiveRbind
- Vec2Dtm
- JSD
- HellDist
- GetPhiPrime
- FormatRawLdaOutput
- Files2Vec
- DepluralizeDtm
- CorrectS
- CalcPhiPrime
In addition: FitLdaModel is going to change significantly in its functionality and argument calls.

textmineR 2.1.2

Deprecated RecursiveRbind - it depended on a deprecated function from the Matrix package. And the replacement offered by Matrix operates recursively, making this function truly superfluous.

textmineR 2.1.1

Corrected some code in the vignettes that caused errors on Linux machines.

textmineR 2.1.0

Added vignettes for common use cases of textmineR
Modified averaging for CalcProbCoherence
Updated documentation to CreateTcm

textmineR 2.0.6

Back-end changes to CreateTcm in response to new text2vec API. Functionality is unchanged.
Changes to how the package interfaces with Rcpp

textmineR 2.0.5

Add verbose option to CreateDtm and CreateTcm to supress status messages.
Add function GetVocabFromDtm to get text2vec vocabulary object from a dgCMatrix document term matrix.

textmineR 2.0.4

Patching errors introduced in version 2.0.3

textmineR 2.0.3

Patches to CreateDtm and CreateTcm in response to updates to text2vec.
More formal update to take advantage of text2vec’s latest optimizations to follow.

textmineR 2.0.2

Patched CreateDtm and CreateTcm. remove_punctuation now supports non-English characters.
Patched TmParallelApply. Added an option to declare the environment to search for your export list. Default to that argument just searches the local environment. The default should cover ~95% of use cases. (And avoids crash on Windows OS)
Patched FitLdaModel. Use of the ... argument now allows you to control TmParallelApply, lda::lda.collapsed.gibbs.sampler, and topicmodels::LDA without error.
Patched FitCtmModel where the ... argument now goes to topicmodels::CTM’s control argument.
Patched CreateTcm to return objects of class dgCMatrix. This allows you to run functions like FitLdaModel on a TCM.
Switched from irlba to RSpectra for LSA models because RSpectra’s implementation is much faster.

textmineR 2.0.1

Patched CreateDtm and CreateTcm. An error caused stopwords to not be removed

textmineR 2.0.0

Vec2Dtm is now deprecated in favor of CreateDtm
A function, CreateTcm, now exists to create term co-occurrence matrices
CreateDtm and CreateTcm are implemented with a parallel C++ back end through the text2vec library
- the implementation is much faster! I’ve clocked 2X - 10X speedups, depending on options
- adds external dependencies - C++ compiler and GNU make - and takes away an external dependency - Java.
- now all tokens will be included, regardless of length. (tm’s framework silently dropped all tokens of fewer than 3 characters.)
Allow generic stemming and stopwords in CreateDtm & CreateTcm
- Now there is only one argument for stopwords, making it clearer how to use custom or non-English stopwords
- Now the stemming argument allows for passing of stem/lemmatization functions.
Function for fitting correlated topic models
Function to turn a document term matrix to term co-occurrence matrix
Allowed LabelTopics to use unigrams, if you want. (n-grams are still better.)
More robust error checking for CalcTopicModelR2 and CalcLikelihood
All function arguments use "_“, not”.".
CalcPhiPrime replaces (the now deprecated) GetPhiPrime
- Allows you to pass an argument to specify non-uniform probabilities of each document
Similarly, CalcHellingerDist and CalcJSDivergence replace HellDist and JSD. This is to conform to a naming convention where functions are “verbs”.

textmineR 1.7.0

Added modeling capability for latent semantic analysis in FitLsaModel()
Added CalcProbCoherence() function which replaces ProbCoherence() and can calculate probabilistic coherence for the whole phi matrix.
Added data from NIH research grants instead of borrowed data from tm
Removed qcq data
Added variational em method for FitLdaModel()
Added function to represent document clustering as a topic model Cluster2TopicModel()

textmineR 1.6.0

Add deprecation warning to ProbCoherence
Allow for arguments of number of cores to be passed to every function that uses implicit parallelziation
Allow for passing of libraries to TmParallelApply (makes this function truely independent of textmineR)
For Vec2Dtm ensure that stopwords and custom stopwords are lowercased when lower = TRUE
Update README example to use model caches