Once upon a time as recently as yesterday, I lived in a world in
which Google Translate was an imperfect but useful tool. I could ask
it my best guess of what someone had said to me, and it would spit
out a decent explantion of what we were talking about. I could point
my phone camera at a wall-label in a museum, and out would come the
information I was reading about in a language I can speak. This all
was incredibly useful, particularly on my Asian trip last summer.
Another thing I
happened to use Google Translate for was as a short-cut in my
research. Now, I’ve been trained up with the best of them. I know
that looking at the original language of, say, a medieval charter is the best
and most accurate way to understand that document’s meaning.
Nevertheless, when working at volume, it can be handy to skim, and
while I can get in the groove with modern German, my medieval
Alemannic dialect reading is slower-paced. If I want a really fast
assessment of something, there’s nothing like my native tongue,
which is English, as you’ve probably guessed by now.
So, when looking
over the roughly 200 charters relevant to the current chapter, I’ve
been going through them quickly via google translate to see if
there’s utility in doing the close-up work of line-by-line and
word-by-word reading. About one out of every 5 will have a topic of
particular interest. I can skim a 69-line whole-side-of-a-cow sized
parchment charter in its janky English translation in about 10
minutes. I can read said document directly in something more like 45 minutes.
Let’s think about
the math:
- To skim English:
10x200 = 2,000 minutes, or roughly 30 hours of reading.
-
To read medieval Alemannic: 45x200=9,000, or roughly 150 hours of reading
Okay, I’ll even be
fair; add back another 20 hours for going through the targeted
documents in detail and I’m still looking at the difference between
50 hours of work and 150 hours of work.
Why am I heated up
about this topic? Well, they broke google translate last night.
Let me say that
again, with all the feels:
THEY
BROKE GOOGLE TRANSLATE LAST NIGHT.
I have receipts, of
course. I’m going to share just one, because it’s been a long and
stressful day this morning (bwahaha).
Here’s a clause
out of one of my documents:
3. brieff Alsz dann der vorgemelt keb hailig Santgall unnser hußsatter Jarlichen ain Suma gebt Im den vigrechten der gestifften Jorlichen Jarzeten
Here’s its
translation, as of yesterday:
3. Furthermore, the aforementioned abbey of Saint Gall, our patron, shall pay annually a sum to the vicar for the proper observance of the established annual memorial services.
Usable, right? Tells
me the basics of what’s going on. Is it elegant? No. Is it fully
accurate? Also no. It is, I think we’d all agree, a janky
translation. (Oxford definition of janky: “of extremely poor or
unreliable quality.”).
But here’s the
thing: this janky translation is USABLE. It tells me whether or not
this is a place I want to spend some of my precious minutes. I mean,
I like down time just like everyone else; these translations are a
shortcut!
But no, it wasn’t
getting enough time-on-the-page, I guess, so Google “improved”
(and I use that word with scare quotes for a reason, so be scared, be
very very scared) its translation tool. Let’s look at the result,
shall we?
3. When the aforementioned [name omitted], the [name omitted], gives our [name omitted] an annual sum in accordance with the established annual [terms omitted].
This is predictive
technology gone bad. The AI underpinning here is obvious. The
“improved” tool is happy to predict anything that’s sort of
standard in a regular document of this type. But all, all, ALL of the
interesting details are now redacted. Because names, and places, and
specific amounts of money are NOT predictable. So I guess we
shouldn’t need to see them, eh? Because everything useful in life
is predictable. (Mad, me mad? Whatever do you mean???)
And this, this is
what they’re calling the “classic” version of the tool. Not
that it bears any resemblance to what the tool was doing yesterday,
of course. But it’s a handy marketing ploy for a company that
clearly Does Not Give A Shit about the user experience. The advanced
version, well, it simply redacted lines 6 to 9 of my document
altogether since those are just like line 5, a list of payments to
particular chaplains.
But MY study is
looking (in part) at exactly that. I need to know how much more the
parish priest gets than the altarist at the St Mang altar. It’s
part of my evidence. And it changes over time. Oh, which makes it
unpredictable.
So when we premise
translations on what words mean, we get one kind of information.
Yesterday, I might argue with whether the “Mesner” was better
translated as a “sacristan” or a “sexton.”
In the land of
predictive AI, however, we premise translations on what other texts
think might come next, and that means skipping the “minutiae.” The
result? I can no longer tell from the translation that the Mesner,
whatever his role might be, was even present in the document.
A bad translation is something I can argue with; a predictive
omission is something I can’t even see.
This is arguably
great if you’re translating prose. It’s an absolute disaster if
you’re looking at legal records and payments and guidelines for the
foundations. Those kinds of documents are actually designed to deliver
the very small, unpredictable details that AI wants to suppress. They
are accounting devices, legal instruments, and memory machines. It’s
like AI trying to tell you what flavor of icecream is your favorite
based on other people’s orders. It has absolutely, positively no
idea of what *you* might want, but that won’t stop it trying, using
that oh-so-confident voice, though.
Janky, bad
translations, in other words, are part of my world of work. They have
a use. They may be inelegant, but their very bumps and hiccups are
pointers to the curious oddity. They keep the text visible as a
text. As a user, I still see
names, sums, offices, altars, weird textual repetitions – the very
things that are likely innovations in this particular textual
example. Predictive smoothing, by contrast, is a lie of fluency. It
gives you the shape of a charter without its substance. To put it
another way, jankiness is epistemologically honest. It doesn’t pretend to understand more
than it does.
Cory Doctorow has
brought us the concept of “enshittification,” the reality that a
captured audience is merely monetary potential to the big firms that
think they own our data. And yes, this update is truly, truly, truly
the enshittified version of what a translator is supposed to do. In
fact, from where I’m sitting, this is not even translation anymore.
It’s instead content abstraction masquerading as translation. A
translator is accountable to the source text; a predictive
model is accountable to statistical plausibility. In fact, I have
trouble communicating just how BAD it is at the job it was perfectly
adquate at yesterday, but you get the general gist.
And the reality is
that an enshittified product is pretty much what you’re stuck with
from here on out, unless Google changes its mind, and rolls back to
yesterday’s model.
Happily for me, I
can, in fact, read my texts. I have access to good dictionaries, and
I do subscribe to DeepL for toggling languages with modern German.
(DeepL struggles *hard* with Alemannic, but then, don’t we all?).
And in a pinch, ChatGPT actuall does a decent job with the odd
sentence or two.
But the fact that
yesterday was easy, and today my tool is broken? This is the way of
this tech-heavy world of ours. Because yesterday’s Google Translate
assumed that you were the expert deciding what mattered.
Today’s assumes the model knows better. That’s not just
frustrating; it’s a quiet and very, very creepy reordering of
authority in knowledge production. Scholars of thin archives (like
the ones I work on in Bregenz, Austria and in Bischofszell,
Switzerland) are exactly the ones who lose when the world (or the
tech-companies) decides that unpredictability is noise. Because the
unpredictable is often where the truth lies.