<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ThinkChat2.0新版上線,更智能更精彩,支持會話、畫圖、視頻、閱讀、搜索等,送10W Token,即刻開啟你的AI之旅 廣告
                # 加載和準備 text8 數據集 現在我們使用 text8 數據集執行相同的加載和預處理步驟: ```py from datasetslib.text8 import Text8 text8 = Text8() text8.load_data() # downloads data, converts words to ids, converts files to a list of ids print('Train:', text8.part['train'][0:5]) print('Vocabulary Length = ',text8.vocab_len) ``` 我們發現詞匯長度大約是 254,000 字: ```py Train: [5233, 3083, 11, 5, 194] Vocabulary Length = 253854 ``` 一些教程通過查找最常用的單詞或將詞匯量大小截斷為 10,000 個單詞來操縱此數據。 但是,我們使用了 text8 數據集的第一個文件中的完整數據集和完整詞匯表。 準備 CBOW 對: ```py text8.skip_window=2 text8.reset_index_in_epoch() # in CBOW input is the context word and output is the target word y_batch, x_batch = text8.next_batch_cbow() print('The CBOW pairs : context,target') for i in range(5 * text8.skip_window): print('(', [text8.id2word[x_i] for x_i in x_batch[i]], ',', y_batch[i], text8.id2word[y_batch[i]], ')') ``` 輸出是: ```py The CBOW pairs : context,target ( ['anarchism', 'originated', 'a', 'term'] , 11 as ) ( ['originated', 'as', 'term', 'of'] , 5 a ) ( ['as', 'a', 'of', 'abuse'] , 194 term ) ( ['a', 'term', 'abuse', 'first'] , 1 of ) ( ['term', 'of', 'first', 'used'] , 3133 abuse ) ( ['of', 'abuse', 'used', 'against'] , 45 first ) ( ['abuse', 'first', 'against', 'early'] , 58 used ) ( ['first', 'used', 'early', 'working'] , 155 against ) ( ['used', 'against', 'working', 'class'] , 127 early ) ( ['against', 'early', 'class', 'radicals'] , 741 working ) ``` 準備 skip-gram 對: ```py text8.skip_window=2 text8.reset_index_in_epoch() # in skip-gram input is the target word and output is the context word x_batch, y_batch = text8.next_batch() print('The skip-gram pairs : target,context') for i in range(5 * text8.skip_window): print('(',x_batch[i], text8.id2word[x_batch[i]], ',', y_batch[i], text8.id2word[y_batch[i]],')') ``` 輸出為: ```py The skip-gram pairs : target,context ( 11 as , 5233 anarchism ) ( 11 as , 3083 originated ) ( 11 as , 5 a ) ( 11 as , 194 term ) ( 5 a , 3083 originated ) ( 5 a , 11 as ) ( 5 a , 194 term ) ( 5 a , 1 of ) ( 194 term , 11 as ) ( 194 term , 5 a ) ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看