News - NUARI

Detecting AI-Generated Text: NUARI and Norwich University Researchers Publish Study

Written by Jakon Hays | May 13, 2025 4:01:34 PM

 

We are happy to share an article authored by NUARI and Norwich University researchers for the IEEE Transactions on Artificial Intelligence journal.

Dr. Ali Al Bataineh, Director, Artificial Intelligence Center at Norwich University, and Dr. Kristen Pedersen, Chief Research Officer, NUARI, Rachel Sickler, Senior Developer and Machine Learning Engineer, NUARI, and Data Scientist Kerry Kurcz.

From the paper's abstract – "Artificial Intelligence (AI) is increasingly embedded in our everyday lives. With the introduction of ChatGPT in November 2022 by OpenAI, people can now ask a bot to generate comprehensive writeups in seconds. This new transformative technology also introduces ethical, safety, and other general concerns. It is important to harness the power of AI to understand whether a body of text is generated by AI or whether it is organically human. In this paper, we create and curate a medium-sized dataset of 10,000 records containing both human and machine-generated text and utilize it to train a reliable model to accurately distinguish between the two."

Additionally, some points that should be highlighted from the paper are as follows:

  • The dataset that our researchers created was one that other researchers can use – they automated that process so that fellow researchers could create their datasets using our team's process.
  • Models that can legitimately identify AI-written text will become more important as time goes on and more of the internet (big tech's training ground) becomes saturated with this text – it will make LLMs worse (model collapse) if we don't remove it from training data.
  • The literature review found that the models that generated the text are most apt at identifying whether it was AI-generated. Still, our team's results showed that simple machine learning models (XGBoost, logistic regression, and random forest) did better for our team's dataset. People think that only deep learning can do a good job. Still, our research showed that these less computationally expensive and less complex models can do the job too, maybe even better, given the right scenario.

 

The full paper can be downloaded from the IEEE website.