• Septimaeus@infosec.pub
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    1
    ·
    edit-2
    6 days ago

    There was a comment yesterday that offered a simpler explanation than the headline’s conclusion.

    The papers were published by Iranian researchers and in Farsi “scanning” (روبشی) and “vegetative” (رويشی) differ only by one character (ب and یـ) which also happen to be adjacent on the keyboard.

    That is, there’s some evidence that this is a typo or mistranslation that has been reused among non-native speakers, as opposed to a hallucination. If so, it could still be a LM replicating the error, but I’ve definitely seen humans do the exact same thing, especially when there’s a strong language barrier.

    Edit: brevity

    • bitcrafter@programming.dev
      link
      fedilink
      English
      arrow-up
      5
      ·
      5 days ago

      A couple of decades ago I got really confused because I found a lot of papers referring to “comer” cubes, but could not find an actual definition. Eventually I figured out that these were actually “corner” cubes, but somewhere a transcription error occurred that merged the r and n into an m, and this error kept getting propagated because people were just copying and pasting.

      • Septimaeus@infosec.pub
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 days ago

        That’s an apt example from English, especially given the visual similarity of the error.

        It’s the kind of error we would expect AI to be especially resilient against, since the phrase “corner cube” probably appears many times in the training dataset.

        Likewise scanning electron microscopes are common instruments in many schools and commercial labs, so an AI writing tool is likely to infer a correction needed given the close similarity.

        Transcription errors by human authors, however, have been dutifully copied into future works since we began writing stuff down.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 days ago

      Yes. Between that and some bad OCR not recognizing text in columns, causing it to see these words in separate columns as a single phrase, it makes sense that it would be replicated in machine translations.