tokenizer_emoticons: tokenizers for emoticons
Different functions to tokenize text.
from mlxtend.text import tokenizer_emoticons
from mlxtend.text import tokenizer_words_and_emoticons
Overview
Different functions to tokenize text for natural language processing tasks, for example such as building a bag-of-words model for text classification.
References
- -
Example 1 - Extract Emoticons
from mlxtend.text import tokenizer_emoticons
tokenizer_emoticons('</a>This :) is :( a test :-)!')
[':)', ':(', ':-)']
Example 2 - Extract Words and Emoticons
from mlxtend.text import tokenizer_words_and_emoticons
tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
['this', 'is', 'a', 'test', ':)', ':(', ':-)']
API
tokenizer_emoticons(text)
Return emoticons from text
Examples
>>> tokenizer_emoticons('</a>This :) is :( a test :-)!')
[':)', ':(', ':-)']
For usage examples, please see
https://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/
tokenizer_words_and_emoticons(text)
Convert text to lowercase words and emoticons.
Examples
>>> tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
['this', 'is', 'a', 'test', ':)', ':(', ':-)']
For more usage examples, please see
https://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/