This extension provides Natural Language Processing capabilities to Siddhi. Functions of the NLP extension are as follows.
Find Name Entity Type function
Syntax | <string> nlp:findNameEntityType(<string> entityType, <bool> groupSuccessiveMatch, <string> string-variable ) |
---|
Extension Type | Function |
---|
Description | This function uses the following input parameters.
entityType
: This is a user-specified string constant. e.g., PERSON , LOCATION , ORGANIZATION , MONEY , PERCENT , DATE or TIME -
groupSuccessiveMatch : This is a user-specified boolean constant used to group successive matches of the specified entityType and a text stream.
streamAttribute
: A string or the stream attribute in which text stream is included.
This function returns the entities in the text. If you specify group successive matches as true , the result aggregates successive words of the same entity type. |
---|
Example | findNameEntityType("PERSON",true,text)
In the above example, if the text attribute contains "Bill Gates donates £31million to fight Ebola" , the result is Bill Gates . If the group successive match is set to false , two events are generated as Bill and Gates . |
---|
Find Name Entity Type Via Dictionary function
Syntax | <string> nlp:
findNameEntityTypeViaDictionary(<string> entityType, <string> dictionaryFilePath, <string> string-variable ) |
---|
Extension Type | Function |
---|
Description | This function uses the following input parameters.
entityType
: This is a user-specified string constant. e.g., PERSON , LOCATION , ORGANIZATION , MONEY , PERCENT , DATE or TIME
dictionaryFilePath
: The path to the dictionary in which the function searches for the specified entries. The relevant entries for the entity types should be available in the dictionary as shown in the example below.
streamAttribute
: A string or the stream attribute in which text stream is included.
This function returns the entities in the text. If you specify group successive matches as true , the result aggregates successive words of the same entity type. |
---|
Example | findNameEntityTypeViaDictionary("PERSON","dictionary.xml",text)
In the above example, if the text attribute contains "Bill Gates donates £31million to fight Ebola" , and the dictionary consists of the above entries (i.e. entries of the example in theDescription), the result is "Bill" . |
---|
Find Relationship By Verb function
Syntax | <string > text, <string> subject, < string > object, < string > verb nlp:
findRelationshipByVerb (<string> verb, <string> string-variable ) |
---|
Extension Type | Function |
---|
Description | findRelationshipByVerb takes in a user specified string constant as a verb and a text stream, and returns the whole text, subject, object and the verb based on the specified verb. This information can be extracted only if the verb specified exists in the text stream. However, the tense of the verb does not have to match.
The input parameters used are as follows. -
verb : This is a user specified string constant. -
string-variable : A string or the stream attribute which includes the text stream.
|
---|
Examples |
findRelationshipByVerb("say", "Information just reaching us says another Liberian With Ebola Arrested At Lagos Airport") returns the following.
- The whole text
Information as the subjectLiberian as the object.says as the verb.
|
---|
Find Relationship By Regex function
Syntax | <string > text, <string> subject, < string > object, < string > verb nlp:findRelationshipByRegex (<string> regex, <string> string-variable ) |
---|
Extension Type | Function |
---|
Description | This function returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern. |
---|
Example | findRelationshpByRegex('{}=verb >/nsubj|agent/ {}=subject >/dobj/ {}=object', "gates foundation donates $50M in support of #Ebola relief") returns the following.
- The whole text
"foundation" as the subject"$" as the object"donates" as the verb
|
---|
Find Semgrex Pattern function
Syntax | <string > text, <string> match, < string > object, < string > verb nlp:
findSemgrexPattern (<string> regex, <string> string-variable ) |
---|
Extension Type | Function |
---|
Description | The
findSemgrexPattern
function returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern. This function uses the following input parameters. -
regex : A user specified regular expression that matches the Semgrex pattern syntax. -
string-variable : A string or the stream attribute which includes the text stream.
|
---|
Example | findSemgrexPattern('{lemma:die} >/.*subj|num.*/=reln {}=diedsubject', "Sierra Leone doctor dies of Ebola after failed evacuation.")
In this example, the function searches for words with the lemmatization die that are governors on any subject or numeric relation. The dependent is marked as the diedsubject , and the relationship is marked as reln . Thus, the query returns an output stream that has the full match of this expression, i.e. the governing word with lemmatization for die . It also returns the name of the corresponding node for each match it finds. The following is the list of elements in the output stream. - The whole text
dies as the match"nsubj" as reln doctor asdiedsubject
|
---|
Find Tokens Regex Pattern function
Syntax | < string > text, <string> match, <string> group_1, etc. nlp:
findTokensRegexPattern (<string> regex, <string> string-variable ) |
---|
Extension Type | Function |
---|
Description | findTokensRegexPattern returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern. The return also includes the corresponding node in the Semgrex pattern and the corresponding named relation defined in the regular expression for each word/phrase.
This function uses the following input parameters. -
regex : A user specified regular expression that matches the Semgrex pattern syntax. -
string-variable : A string or the stream attribute which includes the text stream.
|
---|
Example | findTokensRegexPattern('([ner:/PERSON|ORGANIZATION|LOCATION/]+) (?:[]* [lemma:donate]) ([ner:MONEY]+)', text) defines three groups:
- The first group looks for words that are entities of either
PERSON , ORGANIZATON or LOCATION with one or more successive words matching same. - The middle group is defined as the non capturing group.
- Third looks for one or more successive entities of type
MONEY .
This function returns the following. The whole text " Paul Allen donates $ 9million " as the match." Paul Allen" , as group_1."$ 9million" as group_2.
|
---|