If you are wondering why for some languages a few technical sentences get translated correctly while none of the non-technical ones do, I suspect this is related to technical language having a large number of borrowings from French, English or Arabic. Also, the bars are sorted in decreasing order in the number of non-technical sentences translated correctly, so the languages that have no correct non-technical translations and a few correct technical ones stand out.
As you know I stand by your thesis, far from neutral, one could say anything created by humans has the print of the identity of the creator in itself (this is valid also for art, but that is another topic).
I read your article with much interest, although seems to me the most important component of the test is missing, the human check, though prone to subjectiveness, the human confirmation, check, control is, from my point of view a necessary prerequisite for such kind of test (as well as many others...). The validaton of the data in this case, if made again by a machine cannot spot its own errors.
What I've been observing lately, it's a colleague asking me about the homograph "lime" in italian, according to the translator tool he had been using (IT->DE) the translation was either "lime, the lemon fruit" or "Datei" (which is German for data, file). Unfortunately, the poor machine, using a bridge language between italian and german didn't spot the ridiculous error of using "file" in its first meaning (data, file), while file in English has another meaning such as the file to smooth and shape. Even giving some context, again the poor AI (or, as I prefer, AS for artificial stupidity), cannot grab the correct word. e.g. Italian: Passami le lime, devo smussare un angolo" becomes --> "Gib mir die Akten, ich muss eine Ecke abrunden" (Give me the files - as in documents - I need to round a corner).
Looking at this hilarious behaviour 2 main points come to my mind, first of all the "bridging" through a more developed language for that machine, which is imho quite embarassing due to the importance in culture of both Italian and German, second the lack of memory (and this is a longer topic) of the machine: how is it possible that an algorithm based on more common use (file as data, instead of file as tool) can prevail profiling or any memory based operation, here I am a bit offtopic, but I hope you can indulge me. My observation becomes way and way more frustating in everyday life when, trying to get some basic code from chatgpt (even 4, same AS as 3.5), there is an absolute lack of all the specs mentioned 2 lines before our current message. I am wondering what are machines for if not for remembering, keeping in memory and giving us a service. This lead me sometimes to imagine what is the use of a computer if I cannot save a file or... access it.
Of course I realize that this expectation I have about keeping in memory or being precise on user's specification would also mean a certain latency in the response. My final thought to close the hilarious topic with a joke would be, maybe this amazing AS is all in javascript and just like a reload button a further message erase most of what was said, though I would say, considering the wonders we read everyday in papers and news... at least I would expect a python not a javascript.
In fact this isn't a paper (yet) in part because I would like to have a native speaker manually check at least a sample of sentences. A native speaker could also try to converse directly with ChatGPT in their language, removing the need to use translation as a proxy task. Finding native speakers for all those languages is a challenge in itself, in fact it's ultimately the same challenge that ChatGPT is facing: despite having millions of native speakers, those languages are still relatively obscure from the point of view of most "westerners" (and from that of Koreans as well). Coming to your horror story with files, even 3.5 nails it, so I guess you should consider switching over from whatever system you are using: https://chat.openai.com/share/1e61ddaa-4b64-4a92-8127-768474126c13
This custom GPT seems to work better, just by virtue of telling it that its job is to translate from Hausa to English and back: https://chat.openai.com/g/g-Cq1CBjjxo-hausagpt
https://aclanthology.org/2022.coling-1.379.pdf
If you are wondering why for some languages a few technical sentences get translated correctly while none of the non-technical ones do, I suspect this is related to technical language having a large number of borrowings from French, English or Arabic. Also, the bars are sorted in decreasing order in the number of non-technical sentences translated correctly, so the languages that have no correct non-technical translations and a few correct technical ones stand out.
As you know I stand by your thesis, far from neutral, one could say anything created by humans has the print of the identity of the creator in itself (this is valid also for art, but that is another topic).
I read your article with much interest, although seems to me the most important component of the test is missing, the human check, though prone to subjectiveness, the human confirmation, check, control is, from my point of view a necessary prerequisite for such kind of test (as well as many others...). The validaton of the data in this case, if made again by a machine cannot spot its own errors.
What I've been observing lately, it's a colleague asking me about the homograph "lime" in italian, according to the translator tool he had been using (IT->DE) the translation was either "lime, the lemon fruit" or "Datei" (which is German for data, file). Unfortunately, the poor machine, using a bridge language between italian and german didn't spot the ridiculous error of using "file" in its first meaning (data, file), while file in English has another meaning such as the file to smooth and shape. Even giving some context, again the poor AI (or, as I prefer, AS for artificial stupidity), cannot grab the correct word. e.g. Italian: Passami le lime, devo smussare un angolo" becomes --> "Gib mir die Akten, ich muss eine Ecke abrunden" (Give me the files - as in documents - I need to round a corner).
Looking at this hilarious behaviour 2 main points come to my mind, first of all the "bridging" through a more developed language for that machine, which is imho quite embarassing due to the importance in culture of both Italian and German, second the lack of memory (and this is a longer topic) of the machine: how is it possible that an algorithm based on more common use (file as data, instead of file as tool) can prevail profiling or any memory based operation, here I am a bit offtopic, but I hope you can indulge me. My observation becomes way and way more frustating in everyday life when, trying to get some basic code from chatgpt (even 4, same AS as 3.5), there is an absolute lack of all the specs mentioned 2 lines before our current message. I am wondering what are machines for if not for remembering, keeping in memory and giving us a service. This lead me sometimes to imagine what is the use of a computer if I cannot save a file or... access it.
Of course I realize that this expectation I have about keeping in memory or being precise on user's specification would also mean a certain latency in the response. My final thought to close the hilarious topic with a joke would be, maybe this amazing AS is all in javascript and just like a reload button a further message erase most of what was said, though I would say, considering the wonders we read everyday in papers and news... at least I would expect a python not a javascript.
In fact this isn't a paper (yet) in part because I would like to have a native speaker manually check at least a sample of sentences. A native speaker could also try to converse directly with ChatGPT in their language, removing the need to use translation as a proxy task. Finding native speakers for all those languages is a challenge in itself, in fact it's ultimately the same challenge that ChatGPT is facing: despite having millions of native speakers, those languages are still relatively obscure from the point of view of most "westerners" (and from that of Koreans as well). Coming to your horror story with files, even 3.5 nails it, so I guess you should consider switching over from whatever system you are using: https://chat.openai.com/share/1e61ddaa-4b64-4a92-8127-768474126c13