Asylum Text Analytics as an Algorithmic Silver Bullet: The Impossible Quest for Automated Fraud Detection

U.S. border wall overlaid with a street pole with a sticker reading "Refugees Welcome"

(By Jeremy A. Rud, guest author)

What do Donald Trump, George Santos, and every migrant applying for asylum in the United States have in common? They’ve all recently been charged with fraud.

Rather than defaming abuse victims, defrauding campaign donors, or lying to Congress, asylum seekers face much subtler accusations: that their stories are untrue and their experiences insufficient to deserve life in the United States as refugees.

Our global system of rigid national borders does more than divide territories. It also restricts access to resources and human rights by categorizing people. In many ways, our political and social institutions make this unavoidable. An immigration system that legally distinguishes refugees from other migrants requires categorization. Categorization requires a comparison between an individual’s claim and a legal definition, which results in an asylum adjudication—an official decision about whether that individual does or does not belong to the category of “refugee,” and therefore access to the material benefits and physical safety that come with it.

Asylum adjudication is a complex, contentious process, and understandably so. Asylum claims are multifaceted and shaped by individual experiences and traumas. For an applicant, the difference between a grant or denial of their asylum claim can mean the difference between a path to legal residence, employment, and citizenship, or deportation, persecution, and even death in their countries of origin.

In the face of these complexities and stakes, United States Citizenship and Immigration Services (USCIS) recently announced their use of an “Asylum Text Analytics” program that may seem at first glance like an algorithmic silver bullet, yet it has little empirical foundation in either linguistic or anthropological science. By automating any part of the narrative evaluations that determine asylum claims, USCIS has doubled down on longstanding systemic biases, misconceptions about how credibility shows up in language, and xenophobic assumptions that some migrants are truly deserving of asylum—but most are lying.

Results of a text analytics process overlaid with redacted mugshots

(Image: a fictional plagiarism detection software interface)

If Asylum Text Analytics is the Solution, What’s the Problem?

In August 2022, USCIS officially announced their use of “Asylum Text Analytics” to “identify plagiarism-based fraud” as a method of prescreening applications for asylum in the U.S. Since at least 2019, USCIS has slowly and quietly revealed their plan to “scan all incoming applications and use text analytics to look for boilerplate language and other patterns or anomalies to flag potentially fraudulent applications.” The origins of this program can be traced back further still to the National Vetting Enterprise, a program of intensive intelligence-gathering on all foreigners entering the U.S. that Former President Donald Trump initiated, notably through three “Muslim travel ban” executive orders in 2017.

If you’re left wondering what this means or how text analytics actually works, you’re not alone. USCIS has been vague about the program’s methods and goals. A close analysis of existing USCIS disclosures is telling, however, so let’s break it down.

First, text analytics or text mining refer to the use of algorithmic technologies to detect patterns and extract data from large bodies of text. It’s important to note that text analytics is not necessarily a form of artificial intelligence (AI) or an autonomous and/or self-learning computation that performs tasks that otherwise require human intelligence to be executed successfully (Taddeo and Floridi 2018).

To understand what USCIS means by “plagiarism-based fraud,” we can make a comparison to Turnitin and other plagiarism detection software used in schools. When teachers use this software on an essay, it scrapes the web and repositories of previously submitted academic content to catch students who copy their peers or claim language from other sources as their own. Educators and students alike have voiced their concerns over Turnitin’s accuracy and argued that its use violates copyright, intellectual property, and data privacy laws. USCIS’s use of automated tools to detect “plagiarism-based fraud” will likely face similar criticisms and litigation.

As other disclosures have shown, not only will USCIS use algorithmic technologies to identify “plagiaristic” language that is directly copied, but also to “detect patterns that could constitute indicators of fraud, national security, and/or public safety concerns” and “flag potential fraud when applicants’ stories don’t align,” according to former USCIS Chief Technology Officer Rob Brown.

To be sure, I’m not advocating that asylum applicants should be allowed to copy others’ stories word-for-word. But does “boilerplate language”—which USCIS has named as a main concern—obviously amount to plagiarism? And what other “patterns or anomalies” constitute fraud according to USCIS? Anomalies for whom, and against what background assumptions about how a “normal” application ought to look and how a story of persecution should be told?

Anyone who studies language, culture, and society knows that human communication is filled with patterns. We can follow patterns or break them deliberately in order to convey our message. We’re often taught that grammar is a set of rules we must follow, but really it’s a way to describe one big pattern that members of a language community can generally expect to be followed. This general predictability is what makes communication possible for members of a language community, whose implicitly shared knowledge of linguistic patterns allows them to communicate with others. Grammatical patterns structure our communication from the level of speech sounds and word parts to whole genres and societal discourses.

For people who don’t study language, culture, and society, it’s often difficult or even impossible to accurately describe the implicit patterns they follow when they communicate linguistically. Unsurprisingly, this gets even harder when it comes to other languages; we often project negative social views rooted in racism and colonialism onto others’ language patterns when we do not immediately recognize or understand them. This presents an immediate problem for USCIS’s Asylum Text Analytics: human officials must program the algorithm to select which linguistic patterns in an application to flag and which to ignore, and these human decisions are often unknowingly or covertly rooted in prejudice.

So, what constitutes fraud according to USCIS officials? It seems that the answer is not only plagiarism, but any language that follows or deviates from a pattern that they’ve chosen to program the algorithm to detect. Most alarmingly, fraud in an asylum application would include evaluators’ implicit (and likely biased) judgments of whether the multiple versions of the stories that migrants are forced to tell to prove their worthiness do or don’t “align.” Officials who wield such technology in their decision-making not only overlook what scientists know about how narratives of trauma are structured and performed, but also are blatantly failing to consider the immense diversity of linguistic and cultural practices that is inherent to an asylum system in a globalized world. After all, migrants come to the U.S. from every culture and language community, often without knowledge of English (or without literacy in any language). Why would we expect them to meet unknowable linguistic standards founded on shoddy theories and unchecked biases?

Of course, USCIS won’t reveal the patterns of language they’re looking for or publish the algorithm’s code; they are simply too afraid that migrants are threats or simply undeserving and therefore “gaming the system.” Without any public or scientific oversight, this gives USCIS carte blanche to deny any application they want under a veneer of algorithmic objectivity. For these many reasons, there is little reason to believe that Asylum Text Analytics will actually succeed—either at detecting fraud or honoring the human rights of asylum-seeking migrants that our country agreed to uphold.

Asylum interviewer and interviewee overlaid with a timeline

(Image: an asylum interview in progress)

Constructing Asylum Narratives

What is an asylum narrative, and how does it fit into the asylum-seeking process? To receive asylum and “refugee” status in the United States, applicants must prove they face “persecution or a well-founded fear of persecution on account of race, religion, nationality, membership in a particular social group, or political opinion” (8 U.S.C. § 1101(a)(42)) in their countries of origin. In addition to biographic questions, Form i-589, “Application for Asylum and for Withholding of Removal,” requires that the applicant give written responses to a series of open-ended questions about the applicant’s past or feared persecution based on at least one of the identity traits mentioned before. Applicants may also attach an optional “declaration” document—which is “their story” as they wish to present it without the question-answer format—as well as other documents to serve as supplementary evidence, including written documentation of persecution or threats, medical documents, police reports, and published reports of human rights abuses.

This might all seem straightforward enough. But after journeying for hundreds if not thousands of miles, often on foot, migrants often carry with them little evidence other than their stories. Their ability to convincingly narrate their fear of identity-based persecution, in written English and to white-collar officials, thus becomes paramount. Seeking and receiving asylum requires that migrants successfully navigate not only physical and geopolitical borders, but also borders of language, culture, and literacy. The latter sociolinguistic differences between the ways migrants speak in their own language communities versus the ways of speaking and writing expected by the U.S. immigration system can be harder to detect and trace but are nevertheless widespread and powerful.

For example, media representations and societal discourses of “refugees” shape our attitudes toward migrants, often subconsciously; implicit biases can influence not only how evaluators read migrants’ narratives, but also how they hear migrants when they speak orally in interviews and hearings. But listeners’ biases are just one way that the influence of other interlocutors on a migrant’s story is backgrounded. The applicant is judged as the sole author of their application despite their countless interactions with, and advice from, smugglers, advocates, lawyers, judges, and other migrants with whom they have conversed on their journey. Even more influential still are translators and interpreters, who must wrest these stories of trauma and fear from oral conversations in migrants’ home languages to produce cogent written English narratives as demanded by the institution. All the while, officials make scrutinous comparisons between each iteration of an applicant’s story while actively searching for “threats” and “fraud.”

The reality is that language deeply affects how we hear migrants’ stories, from the most macro, discursive level to the most micro and interactional, and these forces often remain undetected. For these reasons, scholars in many fields have long studied asylum seekers’ narrative performances in relation to the broader sociopolitical discourses that are inherently skeptical of the legitimacy and credibility of these performances and the claims within (Blommaert 2001; 2009; Daniel and Knudsen 1996; De Fina and Tseng 2017; Eades 2005; Jacobs and Maryns 2021; Malkki 1992; Maryns 2012; Smith-Khan 2017). This extensive body of work has repeatedly demonstrated the immense influence that linguistic minutiae come to exert on asylum seekers: they must establish “credible fear” via culturally, linguistically, and contextually appropriate narrative performances in the face of mistrustful institutions with little accommodation for linguistic and cultural difference. Their asylum applications, and thus their futures and lives, depend on it.

An American flag and Form i-589 with accept/reject icons overlaid

(Image: an American flag and Form i-589 with accept and reject icons)

The Ethnocentric Underpinnings of “Fraud Detection”

What does all this mean for asylum applicants in practice? To answer the questions, I will draw on some prominent examples from previous scholarship as well as my own research, for which I have conducted 17 months of ethnographic fieldwork with a mutual aid organization that assists migrants with their asylum applications. At these pro se asylum clinics, migrant applicants, volunteer preparers, and attorney reviewers work together to document migrants’ stories into Form i-589.

The examples I’ve selected center around issues of time and chronology in asylum narratives, which are some of the most scrutinized aspects of an applicant’s story. Perceived contradictions and inconsistencies in the ordering of events in an applicant’s story—or between the written and oral tellings of the story—are considered indicators of its fabrication. The great breadth of linguistic and cultural differences in this domain, and the way trauma affects it, thus pose one of the greatest sociolinguistic challenges to applicants.

For example, the linguist Anna De Fina (2003) found that Mexican migrants narrated the beginnings of their journeys to the U.S. with detailed time references such as specific days of the week, month, and year, yet such references became increasingly vague or absent throughout the stories. She argued that this pattern correlates with migrants’ amount of control over their movement, and migrants who had less agency over their displacement or knew less about their destination used more of the linguistic indicators that signal a lack of agency. She argues that we should have a processual understanding of narrative, and should examine how narratives emerge under specific socio-historical conditions rather than treating them as finished products with a fixed orientation and structure.

An even more shocking example comes from sociolinguist and linguistic anthropologist Jan Blommaert (2009), who details the incredible journey of Joseph, a migrant from Rwanda who was denied asylum in the United Kingdom due to disputes over his age and nationality. Joseph self-identified as 14 yet was declared to be over 18 by a medical officer after a quick examination. Not only was Joseph channeled through the asylum system as an adult rather than an unaccompanied minor, but he was also forced to account for four extra years in his story. The “unaccounted” years introduced numerous discrepancies throughout his narrative that increasingly contributed to officials’ skepticism toward his asylum claim.

In my own research, I’ve seen many Afghan asylum seekers face similar challenges. Afghanistan has officially used several different calendars in the last hundred years, depending on the political group in power, rather than the solar Gregorian calendar ubiquitous in the West. When the Taliban again took power in 2021, they reimposed the lunar Hijri calendar and effectively changed the year from 1401 to 1444 overnight. What’s more, most Afghans (like many migrants from other regions with decades of conflict) don’t often receive formal birth certificates or other documents that include a birth date. Successive regime changes and Afghani cultural traditions that emphasize other celebrations mean that most Afghan asylum seekers don’t know the specific day they were born. This is just one small example of a great challenge when applying for asylum. In this case, Afghani applicants must mentally transpose their life stories between numerous calendars and based on details they had never before considered significant.

For example, in an interaction between a female Afghani asylum seeker and a volunteer helping her prepare her application at one of the clinics I studied, a challenge arose due to an inconsistency in her description of the timeline leading up to her departure. The most critical events in her asylum claim occurred during these weeks, which included not only the Taliban’s capture of Kabul but also the kidnapping of her son.

The discrepancy centered around the length of time that her son was taken from her; in one instance she said fifteen days, in another she said one month. “You close your eyes and open, and the government changes,” she recounted. Thankfully, when the applicant described the traumatic experience, the volunteer listened and responded to her inconsistencies with care and further inquiry, rather than disbelief. That is, rather than emphasizing fraud detection in her evaluation, the volunteer skillfully listened to the applicant’s story with a multifaceted awareness: an awareness of the institutional expectation of precise and consistent timelines, as well as an awareness that the applicant was speaking in a foreign language according to a foreign calendar and that she had experienced significant trauma which had fragmented and disorganized her recollection. By leveraging this awareness throughout the session, the volunteer and an attorney were able to employ distinct modes of listening that facilitated the applicant’s ability to document her story as she experienced it—and to do so in accordance with the critical standards of credibility that she faced.

Human Systems, Human Lives

In the documentary Well-founded Fear (dir. Robertson and Camerini 2000), one of the featured asylum officers, Kevin, offered a rare glimpse of asylum adjudicators’ reflections on their decision-making processes:

If you’re an applicant, you play asylum officer roulette here. Your chances of getting a grant depend on who you get as much as what your claim is. Simply because, you know, everyone has got their own threshold, everyone has their own interpretation of the law, everyone has their own willingness to believe or to suspend disbelief. It’s just the way the system is. It’s a human system, and that’s the way that human people operate.

This quote shows that the outcomes of asylum adjudications are determined largely by individual considerations of language—what narrative constructions the officer is willing to believe, how the officer understands the language of the law, and the officer’s ability to imagine and believe the applicants’ linguistic depictions of their experiences. It’s important to understand that asylum adjudication remains a human system, even in the case of algorithmic fraud detection. It’s just a question of which humans program the algorithm, based on whose expectations and beliefs, and when and where human decision-making is made observable or invisible.

Even before algorithmic technologies enter the picture, the interplay between an individual’s experiences and the language they use to describe them is deeply complex. As the examples from previous scholarship and my own research show, this relationship is one that linguists and anthropologists are uniquely prepared—and to me, increasingly obligated—to explain.

The same structures of power in our political system that legitimize the voices and stories of Trump and Santos are those that discredit the voices and stories of migrants. The regimes of language within these structures not only shape the U.S. asylum system but also uphold our fears of those unlike us and fuel our confusion over the nature of stories, in addition to how and why we tell and believe them. As we work to unpack these regimes of language to understand what they do, whom they benefit, and whom they harm, I’ll leave it up to you to guess whether Trump, Santos, or migrants will suffer worse consequences for their charges of fraud.

. . . . . . . . . .

Jeremy A. Rud is a PhD candidate in linguistics at the University of California Davis. His research broadly focuses on language in the asylum process and addresses issues of credibility and listening at intersections of public policy, narrative performance, and speech perception.

Talking Politics

Search This Blog