Data Charging Station

Insights

2017-05-12

TENSORFLOW 練習4: word2vec

把字詞轉成word embedding

要在字詞中找到他們之間的某種關聯，而不是分散無意義的符號代表

做這個問題的概念是

假設兩個不同句子中的詞上下文相同，則代表兩個詞的語意相同

今天要來使用skip-gram模型，一個類似二分法的方式(像或著不像)

一開始也同之前的問題，先做數據處理

[(most count word1, n1),(second word2, n2)] 計算出現數量
文字轉成向量
The actual code for this tutorial is very short
([the, code], actual), ([actual, for], code), ...
 # skip-gram pairs
(actual, the), (actual, code), (code, actual), ...
在這之間都會給他編號，變成像是
(10,20),(10,30),(30,10),(30,40),.. 的形式

用上nce loss

我還不熟，大概是我們讓目標的機率越高越好，其餘K個數的機率很低，negative samples

king - queen = man - woman   ==>   king - queen + woman = man

給queen加上負號，並取不要的值，我想是這種感覺吧??

結果

會把相似的詞分的近些

tf_word2vec

原版 tensorflow 有用上sklearn的TSNE 來做降維，在很多地方都比PCA好，讀了以後可以來試試

My Github

Python, Tensorflow

DL, Python, Tensorflow

Posted by:

kbwen

發表留言取消回覆

About Me

A tech enthusiast passionate about data science, financial markets, blockchain, IC/chip industry, cybersecurity, and artificial intelligence.
On this blog, I share insights and experiments in quantitative trading, Python programming, blockchain applications, semiconductor trends, and cybersecurity practices.
With a systematic and interdisciplinary approach, I document both coding tutorials and real-world case studies in AI, chip technology, and FinTech. My goal is to empower investors, developers, and tech professionals to harness the power of data, AI, and next-generation technologies—unlocking smarter strategies and secure, innovative living.

TENSORFLOW 練習4: word2vec

把字詞轉成word embedding

假設兩個不同句子中的詞上下文相同，則代表兩個詞的語意相同

結果

分享此文：

發表留言 取消回覆

熱門文章與頁面︰

發表留言取消回覆