Bad bots have taken over the internet

More than a third of Australia’s Internet traffic is generated by automated bots, according to a new analysis that found good and bad bots scraping data, training generative AI (GenAI) platforms, scalping concert tickets, and reselling premium airline seats and restaurant reservations.

Bots are being programmed to mimic legitimate human users, security firm Imperva notes in the latest annual edition of its Bad Bot Report, which analysed more than 6 trillion blocked and anonymised bad bot requests.

The firm found the Internet is being flooded with automated agents that are, the report notes, designed to compromise targets “by exploiting an application’s intended functionality and processes rather than its technical vulnerabilities.”

Bad bots have been found facilitating high-speed abuse, misuse, and attacks on websites, mobile apps, and APIs – exploiting their native capabilities to scrape large volumes of data, harvest personal and financial data, conduct brute-force credential stuffing attacks, run denial of service attacks, conduct transaction fraud, and automate digital ad fraud that has been estimated to cost $150 billion ($US100 billion) in losses.

While so-called ‘simple bots’ focus on automatically scraping large quantities of data into databases for processing or resale, advanced bad bots bundle a range of capabilities, such as being able to automatically defeat CAPTCHA challenges specifically designed to stop them.

Advanced bots – which the report notes “can achieve their goals with fewer requests than simpler bad bots and are much more persistent in staying on their designated target” – are being focused on targets in high value industries such as law and government, entertainment, financial services, and travel while simple bots are collecting often personal data en masse from targets in people-focused industries like healthcare and education.

“Highly opportunistic” bots have, among other things, been spotted flooding ticket bookings, driver license tests, and restaurant reservation sites to snap up scarce bookings and desirable times, then reselling them on third-party platforms.

Fully 44.8 per cent of the bots were spotted trying to avoid companies’ defences by pretending to be a mobile web browser – a tactic that has become increasingly popular in recent years – while a growing proportion of bad bots route traffic through residential Internet service providers to make it look like they are just eager home users.

“These bot operators… will take advantage of any situation where supply is scarce and demand is high,” Imperva notes.

“These bots exploit the system for profit, making it nearly impossible for genuine customers.”

Holding back the flood

Just like infamous bank robber Willie Sutton, bots go where the money is: despite Australia’s relatively small population, its acknowledged wealth means that fully 8.4 per cent of all bot traffic observed by Imperva was directed at Australian targets.

That was well behind perennial chart-topper the US (47 per cent), but on par with second-place The Netherlands (9 per cent) and nearly twice the proportion directed at the fourth-ranked UK (5.1 per cent).

And while they generally operate in the shadows, the growing volume of traffic that bots produce is turning heads – particularly in Australia, where bad bots now generate 30.2 per cent of all data traversing the Internet and good bots account for a further 6.1 per cent.

That’s around the same proportion of traffic generated by all of Google last year – 18 per cent of all fixed data and 20 per cent of all mobile Internet traffic, according to Sandvine’s latest Global Internet Phenomena Report.

In New Zealand, by contrast, just 19 per cent of traffic is generated by bad bots and a similar percentage by good bots – striking a better balance even as Australia is overrun by bad bots.

The explosion of bots is also leading to significant legal stoushes, particularly around the use of bots to find and collect the mind-boggling quantities of data required to train ever more sophisticated GenAI large language models (LLMs) that are becoming ubiquitous across social media and search services.

Amidst revelations that much of that data is copyrighted and unlicensed, claims that their use of scraping bots falls under ‘fair use’ copyright provisions has proved problematic, setting up legal battles with the likes of the New York Times and other publishers.

AI and LLM development have “brought web scraping back into the spotlight,” Imperva notes, highlighting the challenges that even purportedly ‘good’ bots face as legal challenges catch up with them.

“As AI continues to evolve, the need for clear legal guidelines balancing the needs for data with respect for copyright laws and privacy rights has never been more critical,” the report notes.

“The debate is far from over.”