Jayeless.net

Scots Wikipedia crisis

In August 2020, an individual posted to the /r/Scotland subreddit about a little investigation they’d done into the Scots-language Wikipedia. As they told it, it had long been a known thing among Scots speakers that the Scots Wikipedia was extremely bad (at least, among those who had enough confidence in their Scots-speaking abilities – a certain number instead assumed they just must not speak “proper” Scots). It turned out that actually the reason why the Scots Wikipedia was so out of whack with how Scots-speakers actually speak is because most of it was written by a single American teenager who did not speak Scots at all.1 As the poster on /r/Scotland summed it up:

I think this person has possibly done more damage to the Scots language than anyone else in history. They engaged in cultural vandalism on a hitherto unprecedented scale. Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English rather than being a language or dialect of its own, all because they were exposed to a mangled rendering of English being called Scots by this person and by this person alone. They wrote such a massive volume of this pretend Scots that anyone writing in genuine Scots would have their work drowned out by rubbish. Or, even worse, edited to be more in line with said rubbish.

The American teenager didn’t do this with malice; he was open about being neurodivergent and having OCD and said that editing this Wikipedia became like an obsession for him, and one where he really thought he was being helpful. Nonetheless, the ramifications of his actions are pretty far-reaching. Many people were led to dismiss Scots as poorly-spelled English rather than a separate, legitimate language variety with a distinct history. Various people who tried to contribute to the Scots Wikipedia who actually spoke the language had their contributions reverted and their efforts dismissed. It’s thought that the poor writing of the Scots Wikipedia has “polluted” AIs which had been trained on Scots in general. And apparently, even government documents have managed to draw in this fake Scots language from the Wikipedia, although I’m not sure how or in what capacity.

In the immediate aftermath of the reveal, various language promotion groups and institutes organised “edit-a-thons” to try to fix up the Scots Wikipedia, and several thousand pages were just deleted. Now eighteen months later, I’m not sure whether these kinds of efforts (or less intense, every day wiki work) are ongoing, or whether they all petered out once the original outcry had passed.

The issue did draw attention to the plight of other Wikipedias of small languages: already having small numbers of speakers, the chance that many of them will be enthusiastic about maintaining a Wikipedia is pretty low, and so how can Wikipedia defend against low-effort, counterproductive contributions from people who either don’t even speak the language, or have weird political motivations? For an example of what I mean about “weird political motivations”, apparently the Croatian-language Wikipedia is a neo-Nazi haven, so all normal, non-Nazi Croatians use the Serbo-Croatian Wikipedia instead. But what can, and should, Wikipedia do about that kind of situation?


  1. Well, that and, as it later turned out, none of the other early contributers before this guy started writing articles were fluent Scots speakers either. So the original post was kind of exaggerating to single out only him as the source of the problem, when there were just as badly-written articles by other people. ↩︎