通過Puppeteer Api來控制Chrome進行數據抓取或自動化測試通常模擬鼠標或鍵盤的操作。
接下來通過一些實例來介紹這些基本操作。后文的代碼演示環境如下:
1. headless均設置為false即有界面狀態下測試
2. puppeteer版本0.1.7
3. chrome版本5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3514.2 Safari/537.36
## 鼠標移動
進入baidu首頁,模擬鼠標移動到更多產品按鈕下。
主要流程:
1. 獲取更多產品按鈕對象,page.$(“#u1 > a.bri”)
2. 通過element.boundingBox()拿到坐標參數
3. 移動鼠標page.mouse.move(x,y)
```
(async () => {
let browser = await puppeteer.launch({headless: false});
let page = await browser.newPage();
let response = await page.goto("https://www.baidu.com/");
await page.waitFor(1000);
let element = await page.$("#u1 > a.bri");
let box = await element.boundingBox();
const x = box.x + (box.width/2);
const y = box.y + (box.height/2);
await page.mouse.move(x,y);
await page.waitFor(10000);
await page.close();
await browser.close();
}
)();
```
## 鼠標拖拽
進入baidu地圖首頁進行拖拽模擬。由于puppeteer并未直接提供拖動api,因此拖動通過按下鼠標、移動鼠標、松開鼠標這三個操作來模擬。運行如下的拖拽操作會看到地圖發生拖動。
```
(async () => {
let browser = await puppeteer.launch({headless: false});
let page = await browser.newPage();
let response = await page.goto("https://map.baidu.com/");
await page.waitFor(1000);
await page.mouse.move(500,400);
await page.mouse.down();
await page.mouse.move(100,200,{steps:1000});
await page.mouse.up();
await page.waitFor(10000);
await page.close();
await browser.close();
}
)();
```
## 鼠標單擊
這次我們進入百度首頁,單機新聞連接,并獲取跳轉后頁面的html代碼。示例代碼如下。
```
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
const response = await page.goto("https://www.baidu.com/");
const searchButton = await page.$("#u1 > a:nth-child(1)");
const box = await searchButton.boundingBox();
const x = box.x + (box.width/2);
const y = box.y + (box.height/2);
const r = await Promise.all([
page.mouse.click(x,y),
page.waitForNavigation()
]);
const content = await page.content();
console.log(content);
await page.waitFor(20000);
}
)();
```
單擊鏈接會觸發頁面跳轉,當單機后直接嘗試通過page.content()獲取頁面內容會出現異常(觀察該效果可將例子中page.waitForNavigation()注釋)
> UnhandledPromiseRejectionWarning: Error: Execution context was destroyed, most likely because of a navigation.
因此需要調用page.waitForNavigation()等待頁面跳轉完畢,然后在調用page.content()才能正常獲取頁面內容。
## 鍵盤按鍵
這個例子中,首先進入我的博客首頁https://blog.csdn.net/Revivedsun,然后在輸入框中輸入搜索文字,隨后按下回車。按下回車后會接著打開一個新的標簽頁,我們將獲取這個標簽頁的html文本。首先完整例如下。
```
function getNewPageWhenLoaded(browser) {
return new Promise((x) => browser.once('targetcreated', async (target) => {
const newPage = await target.page();
const newPagePromise = new Promise(() => newPage.once('domcontentloaded', () => x(newPage)));
const isPageLoaded = await newPage.evaluate(() => document.readyState);
return isPageLoaded.match('complete|interactive') ? x(newPage) : newPagePromise;
}));
}
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
const response = await page.goto("https://blog.csdn.net/Revivedsun");
const elementHandler = await page.$("#csdn-toolbar > div > div > ul > li:nth-child(2) > div > a");
await page.focus("input#toolber-keyword");
await page.keyboard.type("Puppeteer",{delay:100});
const newPagePromise = getNewPageWhenLoaded(browser);
await page.keyboard.press("Enter",{delay:50});
const newPage = await newPagePromise;
const content = await newPage.content();
console.log(content);
}
)();
```
打開新標簽頁后需要獲取新標簽頁的對象,這時通過監聽事件targetcreated來實現。當事件targetcreated發生后,在回調中獲取page頁標簽對象,隨后監聽domcontentloaded事件,等待頁面加載完畢,一旦加載完畢,調用Promise構造函數中的resolve方法,將頁標簽對象傳遞給調用方。若未加載完畢,則將返回Promise對象。狀態檢測通過document.readyState來獲取。
```
function getNewPageWhenLoaded(browser) {
return new Promise((x) => browser.once('targetcreated', async (target) => {
const newPage = await target.page();
const newPagePromise = new Promise(() => newPage.once('domcontentloaded', () => x(newPage)));
const isPageLoaded = await newPage.evaluate(() => document.readyState);
return isPageLoaded.match('complete|interactive') ? x(newPage) : newPagePromise;
}));
}
```
這種方法可以獲得新打開標簽頁對象。此外頁可以嘗試修改target屬性為_selft,在當前頁打開新頁面。這是使用“鍵盤按鍵“小節中介紹的方法獲取新跳轉頁面html內容即可。 屬性修改示例如下。
```
const elementHandler = await page.$(cssSelector);
await page.evaluateHandle((e) => {
e.target = '_self'
} ,elementHandler);
```
# 參考
1. target屬性修改,
https://github.com/GoogleChrome/puppeteer/issues/386#issuecomment-343059315
2. 鍵盤按鍵表
https://github.com/GoogleChrome/puppeteer/blob/v1.7.0/lib/USKeyboardLayout.js
3. 事件監聽獲取新打開標簽對象,
https://github.com/GoogleChrome/puppeteer/issues/386
---------------------
作者:FserSuN
原文:https://blog.csdn.net/revivedsun/article/details/82121123