Diffbot

Learn how to use Diffbot with Composio

Overview

Enum

DIFFBOT

Description

Diffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data.

Authentication Details

Actions

Tool to retrieve account details, including plan information and usage statistics. use after authenticating to verify subscription and daily quota status.

Action Parameters

Action Response

data
object
error
string
successful
boolean
Tool to automatically determine a page's content type and route it to the appropriate extraction api. use when you have only a url and need diffbot to choose the right extractor.

Action Parameters

callback
string
fields
string
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to extract information from articles, including authors, publication dates, and images. use when you need structured metadata from a web article url.

Action Parameters

callback
string
discussion
boolean
fields
array
mode
string
paging
string
stats
boolean
timeout
integer
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to extract threads of content from forums, comment sections, and review pages. use when you need structured discussion data from web pages after identifying the discussion url.

Action Parameters

discussion
booleanDefaults to True
fields
string
maxPages
integerDefaults to 1
norender
boolean
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to extract event details from web pages. use when you need structured event data such as venue, date, and description.

Action Parameters

callback
string
fields
string
paging
boolean
timeout
integer
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to extract detailed information about images, including dimensions and recognition data. use after confirming the image url is publicly accessible.

Action Parameters

fields
array
paging
boolean
timeout
integer
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to extract product information such as specifications, prices, availability, and reviews. use when you need structured product data including specs, pricing, and reviews.

Action Parameters

callback
string
discussion
boolean
fields
array
mode
string
paging
boolean
timeout
integer
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to extract information from videos, including titles, descriptions, and embedded html. use when you need structured video metadata from any web page.

Action Parameters

callback
string
discussion
boolean
fallback
boolean
fields
array
mode
string
paging
boolean
timeout
integer
url
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to list all bulk jobs associated with a specific token. use after authenticating to retrieve statuses of all jobs for the account.

Action Parameters

Action Response

data
object
error
string
successful
boolean
Tool to resolve lost ids in the knowledge graph. use when you need to map a lost identifier to its canonical counterpart for data consistency.

Action Parameters

lostId
stringRequired
type
string

Action Response

data
object
error
string
successful
boolean
Tool to start a bulk extract job. use when processing large numbers of urls asynchronously.

Action Parameters

apiUrl
stringRequired
jobConfig
object
name
string
notifyEmail
string
notifyWebhook
string
urlList
string
urls
array

Action Response

data
object
error
string
successful
boolean
Tool to spider a site for links and process them with the extract api into a single collection. use when you have seed urls and want to collect structured data across a site. requires a plus plan for crawl api access.

Action Parameters

apiUrl
string
crawlDelay
number
customHeaders
object
maxToCrawl
integer
maxToProcess
integer
name
stringRequired
notifyEmail
string
obeyRobotsTxt
booleanDefaults to True
repeat
string
seeds
arrayRequired
type
stringRequired

Action Response

data
object
error
string
successful
boolean
Tool to stop a running bulk job. use when you need to halt further processing of urls in a job in progress. invoke only after confirming the jobid to avoid accidental stoppage.

Action Parameters

jobId
stringRequired

Action Response

data
object
error
string
successful
boolean