Photographs of Australian children have been used to train generative artificial intelligence systems without their consent after they were included in a popular image dataset, an investigation by Human Rights Watch has found.

The non-profit advocacy group said its analysis of a sample of 5,850 links to images in the LAION-5B dataset — which has been used to train many popular AI models — found it contained links to 190 identifiable photos of Australian children.

The organisation said there were likely many other links to similar images in the dataset, as its review covered less than 0.0001 per cent of the 5.85 billion image links and captions contained in LAION-5B.

The photos were often easily traceable to a certain time or location, and some included the children’s names in the caption or image URL, Human Rights Watch said.

“One such photo features two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural,” the group said.

“The accompanying caption reveals both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia.

“Information about these children does not appear to exist anywhere else on the internet.”

Human Rights Watch said the images it found included some of children being born, girls in swimsuits at a swimming carnival, and photos of First Nations children.

The group said while some of the images were posted to the internet by children, it appeared family members, schools and photographers had posted some to blogs and photo sites, but Human Rights Watch could not find them on publicly available versions of those sites.

LAION, the German research organisation which manages the dataset, said it disputed that LAION-5B included links to content which was not available on the public internet.

"Any information obtained by Human Rights Watch is publicly available, though for some reason unknown to us, they would like to pretend it is not the case,” the group said in a statement to Information Age.

LAION said it had removed links to images of Australian children found by Human Rights Watch, but added that this would not remove the “actual original images hosted by the responsible third parties on [the] public internet”.

“As long as those images along with private data remain publicly accessible, any other parties collecting data will be able to obtain those for their own datasets that will remain closed in most cases,” it said.

LAION said that unlike some organisations, it made its datasets “available for public scrutiny and safety checks” so that people could take steps to remove any original images they were worried about.

German nonprofit LAION denies many of the claims made by Human Rights Watch. Photo: Shutterstock

Concerns photos could be used in deepfakes

Human Rights Watch said it believed datasets like LAION-5B were sometimes used to create “malicious deepfakes that put even more children at risk of exploitation and harm” or caused their data to be leaked.

In June, dozens of girls from a Melbourne school reported that their social media photos had been used to create sexually explicit deepfakes of them, leading to the arrest of a male teenager — but it remains unclear which systems the alleged offender used.

Human Rights Watch claimed the presence of links to photos of Australian children in LAION-5B amplified "the existing risk children face that someone will steal their likeness from photos or videos of themselves posted online and use AI to manipulate them into saying or doing things that they never said nor did”.

Hye Jung Han, a children’s rights and technology researcher at Human Rights Watch, said children should not have to fear that “their photos might be stolen and weaponised against them”.

“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” she said.

“Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”

Human Rights Watch also said AI models risked causing harm to Indigenous Australians, as many First Nations peoples did not reproduce images of deceased people during periods of mourning.

'Parents should behave responsibly’

LAION denied Human Rights Watch’s claims that AI models trained using LAION-5B could leak private information or easily reproduce identical copies of the material they were trained on.

The non-profit said it was also impossible to draw conclusions about possible impacts on child safety from the “tiny amounts of data analysed by Human Rights Watch”.

“Any claims based on very small, handpicked samples analysed without proper statistics at larger scales cannot be used for any serious conclusions, and are certainly not representative,” it said.

LAION said the most effective way to maintain the safety of children was to remove their content from the public internet, and for parents to be wary of the information they share about their kids.

“Parents should behave responsibly and not post private sensitive data related to their children on [the] public internet, where it can be easily collected by various parties,” LAION said.

The organisation added that it encouraged the use of awareness campaigns “to prevent irresponsible posting of sensitive information".

Stanford researchers found hundreds of images of child sexual abuse material (CSAM) in the LAION-5B dataset in 2023.

The dataset was briefly taken offline and some content removed, before the dataset was made public again.

Other studies have previously found problematic content in LAION-5B and other public datasets, including links to images and captions featuring abuse, pornography, and racist and ethnic slurs.

Many generative AI systems are at least partly trained using publicly-available data.

Meta, the owner of Facebook and Instagram, has itself used data from public posts from its Australian users to train its AI systems — a move which was this week prohibited by the government of Brazil for its own citizens.

Hopes for Australian reforms

Human Rights Watch said it expected the federal government’s upcoming Privacy Act reforms, expected in August, to include a draft of Australia’s first child data protection law, known as the Children’s Online Privacy Code.

“This code should protect the best interests of the child, as recognised in the United Nations Convention on the Rights of the Child, and their full range of rights in the collection, processing, use, and retention of children’s personal data,” the organisation said.

It added that the code “should also prohibit the non-consensual digital replication or manipulation of children’s likenesses”.

“And it should provide children who experience harm with mechanisms to seek meaningful justice and remedy,” it said.

In June, the Attorney-General introduced a bill banning the non-consensual creation or sharing of sexually explicit deepfakes of adults, while such imagery of children would continue to be treated as CSAM.

Human Rights Watch said that this approach missed “the deeper problem that children’s personal data remains unprotected from misuse, including the non-consensual manipulation of real children’s likenesses into any kind of deepfake”.

“Australia’s government should also ensure that any proposed AI regulations incorporate data privacy protections for everyone, and especially for children,” it said.