my first post

2024-06-14

第一章

需要测试一下常见的数学公式 and md语法

核心思想与数学推导：

Following that logic, our co-occurrence matrix will look like this:

Rendered by QuickLaTeX.com

From the table above, we can notice that the co-occurrence matrix is symmetric. It means that the value with row X and column Y will be the same as the value with row Y and column X. In general, we don’t need to keep all elements from the text corpus in the co-occurrence matrix but only those of interest.
GloVe（Global Vectors for Word Representation）的数学推理部分主要包括以下几个步骤：

共现矩阵的构建：根据语料库构建一个共现矩阵 (X)，其中元素 (X_{ij}) 表示单词 (i) 和上下文单词 (j) 在特定上下文窗口内共同出现的次数。
近似关系的公式：提出词向量与共现矩阵之间的近似关系：
$$
w_i^T \tilde{w_j} + b_i + \tilde{b_j} = \log(X_{ij})
$$
其中， (w_i) 和 (\tilde{w_j}) 是词向量， (b_i) 和 (\tilde{b_j}) 是偏置项。
损失函数的构建：
$$
J = \sum_{i,j=1}^{V} f(X_{ij}) (w_i^T \tilde{w_j} + b_i + \tilde{b_j} - \log(X_{ij}))^2
$$
损失函数使用了一个加权函数 (f(X_{ij}))，其目的是对频繁共现的单词赋予更大的权重，同时避免权重过大。
权重函数的形式：选择分段函数形式的权重函数：
$$
f(x) = \begin{cases}
(x/x_{max})^\alpha & \text{if } x < x_{max} \
1 & \text{otherwise}
\end{cases}
$$

引用：https://www.fanyeong.com/2018/02/19/glove-in-detail/