gaussian naive bayes Algorithm
The Gaussian Naive Bayes algorithm is a probabilistic machine learning algorithm based on the Bayes theorem, primarily used for classification problems. It is called "naive" because it assumes that the features in the dataset are independent of each other, meaning that the presence of one feature does not affect the presence of another feature. This simplification allows the algorithm to perform well on various tasks, despite the fact that the independence assumption might not hold true in real-world scenarios. Gaussian Naive Bayes specifically deals with continuous data, assuming that the continuous values associated with each class are distributed according to a Gaussian (or normal) distribution.
The Gaussian Naive Bayes algorithm works by first computing the prior probabilities for each class, which represent the likelihood of each class occurring in the dataset. Then, it calculates the likelihood of observing a particular feature value given a specific class, assuming that the feature values are distributed according to a Gaussian distribution. To classify a new data point, the algorithm computes the posterior probability of each class given the feature values of the data point, and assigns the class with the highest posterior probability. Since Gaussian Naive Bayes is computationally efficient and can handle large datasets, it is often used as a baseline method for text classification, spam filtering, and other applications where features can be assumed to be conditionally independent.
# Gaussian Naive Bayes Example
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import plot_confusion_matrix
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
def main():
"""
Gaussian Naive Bayes Example using sklearn function.
Iris type dataset is used to demonstrate algorithm.
"""
# Load Iris dataset
iris = load_iris()
# Split dataset into train and test data
X = iris["data"] # features
Y = iris["target"]
x_train, x_test, y_train, y_test = train_test_split(
X, Y, test_size=0.3, random_state=1
)
# Gaussian Naive Bayes
NB_model = GaussianNB()
NB_model.fit(x_train, y_train)
# Display Confusion Matrix
plot_confusion_matrix(
NB_model,
x_test,
y_test,
display_labels=iris["target_names"],
cmap="Blues",
normalize="true",
)
plt.title("Normalized Confusion Matrix - IRIS Dataset")
plt.show()
if __name__ == "__main__":
main()