word occurrence Algorithm

The word occurrence algorithm is a widely-used technique in natural language processing, text mining, and information retrieval that aims to identify and quantify the frequency of words in a given text or set of documents. This algorithm is particularly useful for extracting meaningful insights from unstructured data, such as identifying common themes, trends, and patterns in large corpora. By counting the number of times each word appears in the text, the word occurrence algorithm can help determine the relative importance of different words, which can then be used for tasks such as text summarization, sentiment analysis, or topic modeling. The basic idea behind the word occurrence algorithm is to tokenize the input text into individual words, and then maintain a data structure, such as a dictionary or hash table, to store the frequency count for each word. The algorithm starts by initializing an empty dictionary, then iterates through each word in the text, and increments the count of that word in the dictionary. To account for variations in word forms, stemming or lemmatization techniques can be applied to reduce words to their base form, and stop words can be removed to focus on more meaningful content. Once the entire text has been processed, the final word frequencies can be sorted, analyzed, or visualized to gain insights into the content and structure of the text.
# Created by sarathkaul on 17/11/19
# Modified by Arkadip Bhattacharya(@darkmatter18) on 20/04/2020
from collections import defaultdict


def word_occurence(sentence: str) -> dict:
    """
    >>> from collections import Counter
    >>> SENTENCE = "a b A b c b d b d e f e g e h e i e j e 0"
    >>> occurence_dict = word_occurence(SENTENCE)
    >>> all(occurence_dict[word] == count for word, count
    ...     in Counter(SENTENCE.split()).items())
    True
    >>> dict(word_occurence("Two  spaces"))
    {'Two': 1, 'spaces': 1}
    """
    occurrence = defaultdict(int)
    # Creating a dictionary containing count of each word
    for word in sentence.split():
        occurrence[word] += 1
    return occurrence


if __name__ == "__main__":
    for word, count in word_occurence("INPUT STRING").items():
        print(f"{word}: {count}")

LANGUAGE:

DARK MODE: