Firecrawl 网页抓取
Firecrawl 工具用于智能网页抓取。
功能特性
- 智能内容提取
- 链接发现
- 截图功能
- JavaScript 渲染
- Sitemap 支持
配置
yaml
tools:
firecrawl:
enabled: true
api_key: "${FIRECRAWL_API_KEY}"
url: "https://api.firecrawl.dev"
timeout: 30000
options:
only_main_content: true
screenshot: true
javascript: true
cache: true使用示例
typescript
// 抓取页面内容
const result = await tool.firecrawl.scrape({
url: 'https://example.com/article',
onlyMainContent: true
});
console.log('标题:', result.title);
console.log('内容:', result.content);
// 抓取并截图
const screenshot = await tool.firecrawl.screenshot({
url: 'https://example.com',
format: 'png',
fullPage: false
});
// 发现链接
const links = await tool.firecrawl.discover({
url: 'https://example.com',
depth: 2
});输出格式
typescript
interface ScrapeResult {
success: boolean;
title: string;
description: string;
keywords: string[];
language: string;
content: string; // Markdown 格式
links: Link[];
screenshot?: string; // Base64 图片
metadata: {
pageSize: number;
crawlTime: number;
jsEnabled: boolean;
};
}批量抓取
typescript
// 抓取多个 URL
const results = await tool.firecrawl.batchScrape({
urls: [
'https://example.com/page1',
'https://example.com/page2',
'https://example.com/page3'
],
options: {
onlyMainContent: true
}
});