The US intelligence community has launched a program to develop artificial intelligence that can determine authorship of anonymous writing while also disguising an author’s identity by subtly altering their words.
The Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) program from the Intelligence Advanced Research Projects Activity (IARPA) aims to build software that can perform “linguistic fingerprinting,” the Office of the Director of National Intelligence (ODNI) said.
“Humans and machines produce vast amounts of text content every day. Text contains linguistic features that can reveal author identity,” IARPA said [PDF].
With the right model, IARPA believes it can identify consistencies in a writer’s style across different samples, modify those linguistic patterns to anonymize writing and do it all in a manner that is explainable to novice users, ODNI said. HIATUS AIs would also have to be language agnostic.
“We have a strong chance of meeting our goals, delivering much-needed capabilities to the Intelligence Community, and substantially expanding our understanding of variation in human language using the latest advances in computational linguistics and deep learning,” said HIATUS program manager Dr Timothy McKinnon.
In order to develop strong models, HIATUS plans to approach its goals as a question of adversarial AI: Authorship attribution and anonymizing text are two sides of the same problem, and so HIATUS experiment groups will be pitted against each other.
“Attribution systems are evaluated on ability to match items by the same author in large collections, while privacy systems are evaluated on ability to thwart attribution systems,” IARPA said.
The agency said it also plans to develop explainability standards for HIATUS AIs.
McKinnon said that part of what HIATUS is doing is trying to demystify some of the unknowns around neural language models (the focus of HIATUS’ efforts), which he said work well but are essentially black boxes that function without their developers knowing why they make a particular decision.
Ideally, McKinnon said, “when we perform attribution or we perform authorship privacy, we’re able to really understand why the system is behaving the way it is, and be able to verify that it’s not picking up on spurious stuff and that it’s doing the right thing.”
If successful, HIATUS could have far-reaching impacts, from countering foreign influence activities, to identifying counterintelligence risks and protecting authors whose work may endanger them, the ODNI said. McKinnon adds HIATUS AIs may also be able to identify if text is machine-generated rather than human authored.
Some 70 percent of IARPA’s completed research makes its way to other government partners for implementation, which IARPA won’t be involved in – all it does is develop the tech, not turn it into something usable. That said, the odds are in HIATUS’ favor, according to the intelligence agency.
Don’t expect this technology to appear in a complete form any time soon, though: Now that HIATUS has kicked off, it’ll be 42 months (three-and-a-half years) until the experiment wraps up, and it’s only then that other government agencies will likely get to take HIATUS for a spin, if McKinnon and his team are successful. ®