A leaked database containing detailed profiles of 35,000 Australians shows the extent of China’s foreign data surveillance regime.

Known as the Overseas Key Information Database (OKIDB), the database was created by Shenzen company Zhenhua Data which reportedly had clients in Chinese military and intelligence agencies.

According to media reports, the database contains personal information on high profile Australians including politicians, military personnel, business leaders, and technologists like Atlassian founders Mike Cannon-Brookes and Scott Farquhar.

Professor at Christopher Balding from the Fulbright University Vietnam and co-founder of Canberra security firm Internet 2.0, Robert Potter, worked with a global group of journalists to disclose their findings about the database.

“Here we provide the first direct evidence of data collected by China on its monitoring and data collection on foreign individuals and institutions for purposes of intelligence and influence operations,” Balding and Potter said in a co-authored paper.

“The unique blend of civil-military fusion pushed by China that works with private firms to engage in state policy activities such as intelligence gathering should be concerning.

“Foreign individuals and institutions working in sensitive or influential sectors need to be aware of how China is targeting them for influence operations.”

Balding got the database from a source in China before enlisting the help of Potter and Internet 2.0 to verify and extract the data.

The team at Internet 2.0 were able to recover 10 per cent of the database which had files on roughly 2.4 million people around the world.

Of those 250,000 files, 50,000 were on American citizens and 35,000 were on Australians.

Along with profiles on prominent Australians and their families, the OKIDB also tags people of “special interest” or who are “politically exposed”.

“Designed to assist the Chinese government, security, and intelligence services, OKIDB adds in multi-layered functionality to help target and link individuals,” Balding and Potter said.

“Though not extensive, we found analyst notes about certain targets. Certain indexes had classifiers for individuals or institutions such as importance.

“It also assisted in a variety of relationship mapping. For instance, it recorded broad family relationships and work history.

“It also had other more complex big data capabilities that allowed relationship and network mapping from business networks to personnel linked to a carrier ship to social media influencers.”

Big data

Data collection has been at the heart of a stoush between the White House and Chinese-owned social media phenomenon TikTok which Trump moved to ban last month.

“This data collection threatens to allow the Chinese Communist Party access to Americans’ personal and proprietary information,” Trump said in his executive order.

Yet most of the information collated in this recently unveiled database was publicly available through US-owned social media platforms like Facebook, Twitter and LinkedIn.

The issue of third party data scraping came to the fore during the Cambridge Analytica scandal and again upon the discovery of controversial facial recognition company, Clearview AI, which is built on a database of images pulled from the web without users’ consent.

Balding and Potter estimate that only between 10 and 20 per cent of the data in OKIDB was “not publicly or easily available from public sources” with the ABC saying some of this non-public information appeared to be gathered from sources like job applications and bank records.

“We have reason to believe some of the data comes from unauthorised data access such as hacking but we cannot be certain,” Balding and Potter said.

“Non-open source data had a tendency to tie to higher security individuals but not always.”

Public data

Even though the discovery of China’s surveillance database led Monday’s news cycle, not everybody was convinced it was big news.

Cybersecurity journalist Jeremy Kirk claimed to have found the OKIDB in an “unsecured elasticsearch cluster” back in January.

Taking to Twitter, Kirk said there was little on the database that was particularly sensitive.

“If you put it on social media and your privacy settings are open, well, you’ve been warned for ages that this was a bad idea,” he said.

“Zhenhua Data sold access to this information, just like many other companies that specialise in this kind of intelligence.

“You could subscribe then log in … It wasn’t trying to hide. And it wasn’t very good at protecting its own data.

“This doesn’t compare in any sense with what professional intelligence agencies do. Don’t panic. They couldn’t even secure an elasticsearch cluster right.”