AI search startup Perplexity has been accused of hiding its identity to collect data from non-consenting websites.

Researchers at networking giant Cloudflare lambasted the AI company for hoovering content from websites it was explicitly blocked from accessing, then using the content to generate responses to user prompts.

“There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences,” Cloudflare said.

Cloudflare identified Perplexity’s “stealth crawling behaviour” after receiving complaints from customers who had configured firewalls and anti-bot rules to disallow the company from crawling their websites.

“These customers told us that Perplexity was still able to access their content even when they saw its bots successfully blocked,” said Cloudflare.

Perplexity later disputed the findings, arguing apparent technical errors in Cloudflare’s analysis “aren't just embarrassing – they're disqualifying”.

“Cloudflare's recent blog post managed to get almost everything wrong about how modern AI assistants actually work,” wrote Perplexity.

Cloudflare has de-listed Perplexity as a verified bot and adjusted its rules to block the company’s purported “stealth crawling”.

Sneaking past the bouncer

To protect their content from bots, website owners can list the known identities of web crawlers and give them instructions in ‘robots.txt’ files.

For example, listing “User-agent: GPTBot… Disallow: /” would block ChatGPT’s declared bot, and listing ‘PerplexityBot’ or ‘Perplexity-User’ would theoretically do the same for Perplexity.

Instead of respecting these measures, Cloudflare found Perplexity simply ignored them.

To confirm their theory, Cloudflare researchers registered multiple new domain names which were not indexed by any search engines or “made publicly accessible in any discoverable way”.

Robots files were set up to “stop any respectful bots from accessing any part” of the sites, but when Cloudflare questioned Perplexity AI about the new domains, the platform still responded with “detailed information” regarding their exact content.

“This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers,” said Cloudflare.

The same tests were then conducted with ChatGPT, which conversely stopped crawling when it was disallowed.

According to Cloudflare, Perplexity also used a “generic browser intended to impersonate Google Chrome on macOS” when its declared crawler was blocked by a web application firewall.

This “undeclared crawler” would rotate through multiple IP addresses not listed in Perplexity’s official IP range, while Perplexity reportedly lodged requests from different networks in “attempts to further evade website blocks”.

“This activity was observed across tens of thousands of domains and millions of requests per day,” said Cloudflare.

AI crawlers threaten the web

In April, Perplexity was identified among Anthropic, Deepseek and Tiktok as one of many vendors driving regular surges in requests to scrape website data, while Cloudflare found around 30 per cent of global web traffic in July was coming from bots.

Further to straining hosting environments and risking downtime, Dana McKay, associate dean of interaction, technology and information at RMIT's School of Computing Technologies, said AI crawlers can threaten legitimate user traffic.

“Not only are AI companies crawling your site, but they’re taking the material too, meaning you’re not getting the benefit of people visiting your site,” McKay told Information Age.

“AI platforms typically don’t want to point to your site, they want to replicate it so users never have to go there.”

McKay explained Perplexity is “slightly less” problematic than other AI crawlers in this regard because it provides links to the content it references.

“With OpenAI, for example, you’re probably not going to get any links to the original material,” she said.

McKay explained this could be particularly detrimental for sites that rely on affiliate links or advertisements.

“They’re essentially killing the ‘golden goose’,” she said.

“It'll be okay for a couple of years – government websites will continue to exist and there will be some people who will create new web content just for AIs – but there will be a lot of people who have long provided information online for free that just won’t be able to continue doing it.”

Perplexity fires back

Perplexity unabashedly refuted Cloudflare’s findings by arguing “modern AI assistants” are “fundamentally” different to traditional web crawlers.

Instead of systematically visiting millions of pages to build massive databases, Perplexity said “user-driven agents” fetch content only when a user requests something specific – similar to how Google scrapes web content to preview search results.

“When companies like Cloudflare mischaracterise user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect – a position that would criminalise email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like,” wrote Perplexity.

“This overblocking hurts everyone.”

Perplexity said it immediately uses web-crawled content to answer user questions and doesn’t store the data or train its agent with it.

The company also argued Cloudflare’s technical findings were incorrect.

“It appears Cloudflare confused Perplexity with 3-6m daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialised tasks,” wrote Perplexity.

The AI vendor argued Cloudflare’s systems – which are used by roughly a fifth of all websites on the internet – are “fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats”.

In a statement to Information Age, Cloudflare said that “rather than addressing their actions”, Perplexity's response attempted to deflect attention by “broadening the discussion to all AI agents”.

“Our point remains specific: content creators should have the right to control access to their content,” said Cloudflare.

“We believe Perplexity's admitted practices undermine this fundamental right.”