Natural Language Processing Extension

This extension provides Natural Language Processing capabilities to Siddhi. Functions of the NLP extension are as follows.

Syntax	`<string> nlp:findNameEntityType(<string> entityType, <bool> groupSuccessiveMatch, <string> string-variable )`
Extension Type	Function
Description	This function uses the following input parameters. `entityType`: This is a user-specified string constant. e.g., `PERSON`, `LOCATION`, `ORGANIZATION`, `MONEY`, `PERCENT`, `DATE` or `TIME` `groupSuccessiveMatch` : This is a user-specified boolean constant used to group successive matches of the specified `entityType` and a text stream. `streamAttribute`: A string or the stream attribute in which text stream is included. This function returns the entities in the text. If you specify group successive matches as `true`, the result aggregates successive words of the same entity type.
Example	`findNameEntityType("PERSON",true,text)` In the above example, if the text attribute contains "`Bill Gates donates £31million to fight Ebola"`, the result is `Bill Gates`. If the group successive match is set to `false`, two events are generated as `Bill` and `Gates`.

Syntax	`<string> nlp: findNameEntityTypeViaDictionary(<string> entityType, <string> dictionaryFilePath, <string> string-variable )`
Extension Type	Function
Description	This function uses the following input parameters. `entityType`: This is a user-specified string constant. e.g., `PERSON`, `LOCATION`, `ORGANIZATION`, `MONEY`, `PERCENT`, `DATE` or `TIME` `dictionaryFilePath`: The path to the dictionary in which the function searches for the specified entries. The relevant entries for the entity types should be available in the dictionary as shown in the example below. `streamAttribute`: A string or the stream attribute in which text stream is included. This function returns the entities in the text. If you specify group successive matches as `true`, the result aggregates successive words of the same entity type.
Example	`findNameEntityTypeViaDictionary("PERSON","dictionary.xml",text)` In the above example, if the text attribute contains `"Bill Gates donates £31million to fight Ebola"`, and the dictionary consists of the above entries (i.e. entries of the example in theDescription), the result is `"Bill"`.

Syntax	`<string > text, <string> subject, < string > object, < string > verb nlp: findRelationshipByVerb (<string> verb, <string> string-variable )`
Extension Type	Function
Description	`findRelationshipByVerb` takes in a user specified string constant as a verb and a text stream, and returns the whole text, subject, object and the verb based on the specified verb. This information can be extracted only if the verb specified exists in the text stream. However, the tense of the verb does not have to match. The input parameters used are as follows. `verb` : This is a user specified string constant. `string-variable` : A string or the stream attribute which includes the text stream.
Examples	`findRelationshipByVerb("say", "Information just reaching us says another Liberian With Ebola Arrested At Lagos Airport")` returns the following. The whole text `Information` as the subject `Liberian` as the object. `says` as the verb.

Syntax	`<string > text, <string> subject, < string > object, < string > verb nlp:findRelationshipByRegex (<string> regex, <string> string-variable )`
Extension Type	Function
Description	This function returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern.
Example	`findRelationshpByRegex('{}=verb >/nsubj\|agent/ {}=subject >/dobj/ {}=object', "gates foundation donates $50M in support of #Ebola relief")`returns the following. The whole text `"foundation"` as the subject `"$"` as the object `"donates"` as the verb

Syntax	`<string > text, <string> match, < string > object, < string > verb nlp: findSemgrexPattern (<string> regex, <string> string-variable )`
Extension Type	Function
Description	The `findSemgrexPattern` function returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern. This function uses the following input parameters. `regex` : A user specified regular expression that matches the Semgrex pattern syntax. `string-variable` : A string or the stream attribute which includes the text stream.
Example	`findSemgrexPattern('{lemma:die} >/.subj\|num./=reln {}=diedsubject', "Sierra Leone doctor dies of Ebola after failed evacuation.")` In this example, the function searches for words with the lemmatization `die` that are governors on any subject or numeric relation. The dependent is marked as the `diedsubject`, and the relationship is marked as `reln`. Thus, the query returns an output stream that has the full match of this expression, i.e. the governing word with lemmatization for `die`. It also returns the name of the corresponding node for each match it finds. The following is the list of elements in the output stream. The whole text `dies` as the match `"nsubj"` as `reln` `doctor` asdiedsubject

Syntax	`< string > text, <string> match, <string> group_1, etc. nlp: findTokensRegexPattern (<string> regex, <string> string-variable )`
Extension Type	Function
Description	`findTokensRegexPattern` returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern. The return also includes the corresponding node in the Semgrex pattern and the corresponding named relation defined in the regular expression for each word/phrase. This function uses the following input parameters. `regex` : A user specified regular expression that matches the Semgrex pattern syntax. `string-variable` : A string or the stream attribute which includes the text stream.
Example	`findTokensRegexPattern('([ner:/PERSON\|ORGANIZATION\|LOCATION/]+) (?:[]* [lemma:donate]) ([ner:MONEY]+)', text)` defines three groups: The first group looks for words that are entities of either `PERSON`, `ORGANIZATON` or `LOCATION` with one or more successive words matching same. The middle group is defined as the non capturing group. Third looks for one or more successive entities of type `MONEY`. This function returns the following. `The whole text` `" Paul Allen donates $ 9million "` as the match. `" Paul Allen"`, as group_1. `"$ 9million"` as group_2.