## 百面机器学习！算法工程师面试宝典！| 码书

3 月 2 日 程序人生

LDA (线性判别分析) 和 PCA 的区别与联系

（1）计算数据集中每个类别样本的均值向量μj，及总体均值向量μ。

（2）计算类内散度矩阵Sw，全局散度矩阵St，并得到类间散度矩阵

（3）对矩阵进行特征值分解，将特征值从大到小排列。

（4）取特征值前d 大的对应的特征向量，通过以下映射将n 维样本映射到d 维

K-均值算法收敛性的证明

（1）E 步骤：计算隐变量的期望

（2）M 步骤：最大化

......

15位一线算法工程师，

《百面机器学习》学习脉络图

《浪潮之巅》《数学之美》作者吴军亦很美誉此书：“这本书教授大家如何搭建计算机理论和算法与具体应用之间的桥梁。它可以让计算机的从业者对理论的认识有一个飞跃，也可以让非计算机专业的工程人员了解计算机科学这个强大的工具。”

2人即可成团！

Imbalanced data commonly exists in real world, espacially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like "but", "though", "while", etc, and the head discourse and the tail discourse 3 usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pretrained attention-based model. Our method increases sample diversity in the first place, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.

Top