Pandas 数据处理情感分析评论

使用库Pandas SnowNLP matplotlib
需求:
使用pandas 进行数据处理将评论者等级和评论内容进行提取
将评论6级到0级的数据分别进行分组对评论进行情感分析写入新表分别查看每个等级正负评论比例
开发环境使用 Jupyter Notebook Pandas == 1.5.3 matplotlib == 3.7.1 SnowNLP == 0.12.3

数据处理

源数据结构

import pandas as pd

df = pd.read_excel("CommentData.xlsx")                        # 读取源数据
df1 = df[["评论作者 UID","评论作者等级", "评论内容"]]            # 抽取评论作者 UID,评论作者等级,评论内容
sixLevelComment = df1.loc[(df1["评论作者等级"] == 6)]         #把0-6级的评论拆分成各个表
fiveLevelComment = df1.loc[(df1["评论作者等级"] == 5)]
fourLevelComment = df1.loc[(df1["评论作者等级"] == 4)]
threeLevelComment = df1.loc[(df1["评论作者等级"] == 3)]
twoLevelComment = df1.loc[(df1["评论作者等级"] == 2)]
zeroLevelComment = df1.loc[(df1["评论作者等级"] == 0)]

from snownlp import SnowNLP            # 导入用于情感分析的库

def commentsEmotion(comments):
    CommentEmotion = []
    for comment in comments["评论内容"]:
        s = SnowNLP(comment)
        sc = s.sentiments
        if sc >= 0.6:
            CommentEmotion.append("正向")
        elif sc > 0.2:
            CommentEmotion.append("中性")
        elif sc <= 0.2:
            CommentEmotion.append("负向")
    return CommentEmotion

sixLevelComment.loc[:,"评论情感分析"] = commentsEmotion(sixLevelComment)
fiveLevelComment.loc[:,"评论情感分析"] = commentsEmotion(fiveLevelComment)
fourLevelComment.loc[:,"评论情感分析"] = commentsEmotion(fourLevelComment)
threeLevelComment.loc[:,"评论情感分析"] = commentsEmotion(threeLevelComment)
twoLevelComment.loc[:,"评论情感分析"] = commentsEmotion(twoLevelComment)
zeroLevelComment.loc[:,"评论情感分析"] = commentsEmotion(zeroLevelComment)

sixLevelComment.to_csv("CommentEmotion/sixLevelComment.csv")
fiveLevelComment.to_csv("CommentEmotion/fiveLevelComment.csv")
fourLevelComment.to_csv("CommentEmotion/fourLevelComment.csv")
threeLevelComment.to_csv("CommentEmotion/threeLevelComment.csv")
twoLevelComment.to_csv("CommentEmotion/twoLevelComment.csv")
zeroLevelComment.to_csv("CommentEmotion/zeroLevelComment.csv")

这边定义一个函数用来做每个等级的评论情感分析然后 loc方法把它他到每个等级表的最后

保存到CommentEmotion 的文件夹中

数据可视化

处理好的数据存到csv中现在来可视化

使用pandas 导入保存在文件夹中的数据统计数量

import pandas as pd

sixLevelDf = pd.read_csv("CommentEmotion/sixLevelComment.csv")["评论情感分析"].value_counts()
fiveLevelDf = pd.read_csv("CommentEmotion/fiveLevelComment.csv")["评论情感分析"].value_counts()
fourLevelDf = pd.read_csv("CommentEmotion/fourLevelComment.csv")["评论情感分析"].value_counts()
threeLevelDf = pd.read_csv("CommentEmotion/threeLevelComment.csv")["评论情感分析"].value_counts()
twoLevelDf = pd.read_csv("CommentEmotion/twoLevelComment.csv")["评论情感分析"].value_counts()
zeroLevelDf = pd.read_csv("CommentEmotion/zeroLevelComment.csv")["评论情感分析"].value_counts()

使用value_counts()统计每个情感的数量

from matplotlib import pyplot as plt     # 导入matplotlib 开始数据可视化


# 定义show函数
def show(data, level):
        plt.rcParams['font.sans-serif']=['SimHei']      # 解决中文乱码
        plt.figure(figsize=(5,3))                        # 设置画布大小
        labels = data.index
        sizes = data.values
        colors = ['red', 'yellow', 'green']
        plt.pie(sizes,                                    # 绘图数据    
                labels=labels,                            # 添加区域水平标签
                colors=colors,                            # 设置饼图填充色 
                labeldistance=1.1,                        # 设置各扇形标签与圆心的距离
                startangle=90,                            # 设置饼图的初始角度
                radius = 0.5,                            # 设置饼图的半径
                center=(0.5,0.5),                        # 设置饼图的原点
                textprops={'fontsize':9, 'color':'k'},  # 设置文本标签的属性值
                pctdistance=0.6,                        # 设置百分比标签与圆心的举例
                autopct='%.1f%%',                        # 设置百分比格式 保留一位小数
                explode=(0.1,0,0),                        # 分裂饼形图
                shadow=True                                # 阴影
                
        )
        plt.axis('equal')
        plt.title(f"B站某视频评论区{level}级情感分析")   # 标题
        plt.show()
        
show(sixLevelDf, 6)
show(fiveLevelDf, 5)
show(fourLevelDf, 4)
show(threeLevelDf, 3)
show(twoLevelDf, 2)
show(zeroLevelDf, 0)

我们这里做6张图每个等级的评论一张

最终图片成品