README

googlenlp

The googlenlp package provides an R interface to Google’s Cloud Natural Language API.

“Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app.” [source]

There are four main features of the API, all of which are available through this R package [source]:

Syntax Analysis: “Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence.”
Entity Analysis: “Identify entities and label by types such as person, organization, location, events, products and media.”
Sentiment Analysis: “Understand the overall sentiment expressed in a block of text.”
Multi-Language: “Enables you to easily analyze text in multiple languages including English, Spanish and Japanese.”

Resources

Installation

You can install the development version from GitHub:

devtools::install_github("BrianWeinstein/googlenlp")

Authentication

To use the API, you’ll first need to create a Google Cloud project and enable billing, and get an API key.

Configuration

Load the package and set your API key. There are two ways to do this.

Method A (preferred)

Method A (preferred method) adds your API key as a variable to your .Renviron file. Under this method, you only need to do this setup process one time.

library(googlenlp)

configure_googlenlp() # follow the instructions printed to the console

googlenlp setup instructions:
 1. Your ~/.Renviron file will now open in a new window/tab.
    *** If it doesn't open, run:  file.edit("~/.Renviron") ***
 2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).
 3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).
 4. In your  ~/.Renviron  file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.
 5. Save your ~/.Renviron file.
 6. *** Restart your R session for changes to take effect. ***

Method B

Method B defines your API key as a session-level variable. Under this method, you’ll need to set your API key at the beginning of each R session.

library(googlenlp)

set_api_key("MY_API_KEY") # replace this with your API key

Getting started

Define the text you’d like to analyze.

text <- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
         Sundar Pichai said in his keynote that users love their new Android phones."

The annotate_text function analyzes the text’s syntax (sentences and tokens), entities, sentiment, and language; and returns the result as a five-element list.

analyzed <- annotate_text(text_body = text)
#> Warning: package 'bindrcpp' was built under R version 3.4.4

str(analyzed, max.level = 1)
#> List of 5
#>  $ sentences        :Classes 'tbl_df', 'tbl' and 'data.frame':   2 obs. of  4 variables:
#>  $ tokens           :Classes 'tbl_df', 'tbl' and 'data.frame':   32 obs. of  17 variables:
#>  $ entities         :Classes 'tbl_df', 'tbl' and 'data.frame':   10 obs. of  8 variables:
#>  $ documentSentiment:'data.frame':   1 obs. of  2 variables:
#>  $ language         : chr "en"

Sentences

“Sentence extraction breaks up the stream of text into a series of sentences.” [API Documentation]

beginOffset indicates the (zero-based) character index of where the sentence begins (wtih UTF-8 encoding).
The magnitude and score fields quantify each sentence’s sentiment — see the Document Sentiment section for more details.

analyzed$sentences

content	beginOffset	magnitude	score
Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.	0	0.0	0.0
Sundar Pichai said in his keynote that users love their new Android phones.	113	0.6	0.6

Tokens

“Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word. The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens.” [API Documentation]

lemma indicates the token’s “root” word, and can be useful in standardizing the word within the text.
tag indicates the token’s part of speech.
Additional column definitions are outlined here and here.

analyzed$tokens

content	beginOffset	lemma	tag	aspect	case	form	gender	mood	number	person	proper	reciprocity	tense	voice	dependencyEdge_headTokenIndex	dependencyEdge_label
Google	0	Google	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	7	NSUBJ
,	6	,	PUNCT	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	0	P
headquartered	8	headquarter	VERB	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	PAST	VOICE_UNKNOWN	0	VMOD
in	22	in	ADP	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	2	PREP
Mountain	25	Mountain	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	5	NN
View	34	View	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	3	POBJ
,	38	,	PUNCT	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	0	P
unveiled	40	unveil	VERB	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	INDICATIVE	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	PAST	VOICE_UNKNOWN	7	ROOT
the	49	the	DET	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	11	DET
new	53	new	ADJ	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	11	AMOD
Android	57	Android	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	11	NN
phone	65	phone	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	7	DOBJ
at	71	at	ADP	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	7	PREP
the	74	the	DET	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	16	DET
Consumer	78	Consumer	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	16	NN
Electronic	87	Electronic	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	16	NN
Show	98	Show	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	12	POBJ
.	102	.	PUNCT	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	7	P
Sundar	113	Sundar	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	19	NN
Pichai	120	Pichai	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	20	NSUBJ
said	127	say	VERB	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	INDICATIVE	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	PAST	VOICE_UNKNOWN	20	ROOT
in	132	in	ADP	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	20	PREP
his	135	his	PRON	ASPECT_UNKNOWN	GENITIVE	FORM_UNKNOWN	MASCULINE	MOOD_UNKNOWN	SINGULAR	THIRD	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	23	POSS
keynote	139	keynote	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	21	POBJ
that	147	that	ADP	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	26	MARK
users	152	user	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	PLURAL	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	26	NSUBJ
love	158	love	VERB	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	INDICATIVE	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	PRESENT	VOICE_UNKNOWN	20	CCOMP
their	163	their	PRON	ASPECT_UNKNOWN	GENITIVE	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	PLURAL	THIRD	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	30	POSS
new	169	new	ADJ	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	30	AMOD
Android	173	Android	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	SINGULAR	PERSON_UNKNOWN	PROPER	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	30	NN
phones	181	phone	NOUN	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	PLURAL	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	26	DOBJ
.	187	.	PUNCT	ASPECT_UNKNOWN	CASE_UNKNOWN	FORM_UNKNOWN	GENDER_UNKNOWN	MOOD_UNKNOWN	NUMBER_UNKNOWN	PERSON_UNKNOWN	PROPER_UNKNOWN	RECIPROCITY_UNKNOWN	TENSE_UNKNOWN	VOICE_UNKNOWN	20	P

Entities

“Entity Analysis provides information about entities in the text, which generally refer to named ‘things’ such as famous individuals, landmarks, common objects, etc… A good general practice to follow is that if something is a noun, it qualifies as an ‘entity.’” [API Documentation]

entity_type indicates the type of entity (i.e., it classifies the entity as a person, location, consumer good, etc.).
mid provides a “machine-generated identifier” correspoding to the entity’s Google Knowledge Graph entry.
wikipedia_url provides the entity’s Wikipedia URL.
salience indicates the entity’s importance to the entire text. Scores range from 0.0 (less important) to 1.0 (highly important).
Additional column definitions are outlined here.

analyzed$entities

name	entity_type	mid	wikipedia_url	salience	content	beginOffset	mentions_type
Google	ORGANIZATION	/m/045c7b	https://en.wikipedia.org/wiki/Google	0.2557206	Google	0	PROPER
users	PERSON	NA	NA	0.1527633	users	152	COMMON
phone	CONSUMER_GOOD	NA	NA	0.1311989	phone	65	COMMON
Android	CONSUMER_GOOD	/m/02wxtgw	https://en.wikipedia.org/wiki/Android_(operating_system)	0.1224526	Android	57	PROPER
Android	CONSUMER_GOOD	/m/02wxtgw	https://en.wikipedia.org/wiki/Android_(operating_system)	0.1224526	Android	173	PROPER
Sundar Pichai	PERSON	/m/09gds74	https://en.wikipedia.org/wiki/Sundar_Pichai	0.1141411	Sundar Pichai	113	PROPER
Mountain View	LOCATION	/m/0r6c4	https://en.wikipedia.org/wiki/Mountain_View,_California	0.1019596	Mountain View	25	PROPER
Consumer Electronic Show	EVENT	/m/01p15w	https://en.wikipedia.org/wiki/Consumer_Electronics_Show	0.0703438	Consumer Electronic Show	78	PROPER
phones	CONSUMER_GOOD	NA	NA	0.0338317	phones	181	COMMON
keynote	OTHER	NA	NA	0.0175884	keynote	139	COMMON

Document sentiment

“Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score and magnitude values.” [API Documentation]

score ranges from -1.0 (negative) to 1.0 (positive), and indicates to the “overall emotional leaning of the text”.
magnitude “indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text’s magnitude (so longer text blocks may have greater magnitudes).”

A note on how to interpret these sentiment values is posted here.

analyzed$documentSentiment

magnitude	score
0.6	0.3

Language

language indicates the detected language of the document. Only English (“en”), Spanish (“es”) and Japanese (“ja”) are currently supported by the API.

analyzed$language
#> [1] "en"