tokenizer_emoticons: tokenizers for emoticons

Different functions to tokenize text.

from mlxtend.text import tokenizer_emoticons
from mlxtend.text import tokenizer_words_and_emoticons

Overview

Different functions to tokenize text for natural language processing tasks, for example such as building a bag-of-words model for text classification.

References

Example 1 - Extract Emoticons

from mlxtend.text import tokenizer_emoticons

tokenizer_emoticons('</a>This :) is :( a test :-)!')

[':)', ':(', ':-)']

Example 2 - Extract Words and Emoticons

from mlxtend.text import tokenizer_words_and_emoticons

tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')

['this', 'is', 'a', 'test', ':)', ':(', ':-)']

API

tokenizer_emoticons(text)

Return emoticons from text

Examples

    >>> tokenizer_emoticons('</a>This :) is :( a test :-)!')
    [':)', ':(', ':-)']

    For usage examples, please see
    https://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/

tokenizer_words_and_emoticons(text)

Convert text to lowercase words and emoticons.

Examples

    >>> tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
    ['this', 'is', 'a', 'test', ':)', ':(', ':-)']

    For more usage examples, please see
    https://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search