Shortcoming in multilingual visual translation ability

by liziming - opened 19 days ago

19 days ago

I tried jina-vlm on multilingual visual translation tasks, which involves translating the multiple languages in the image into Mandarin. However, I found that the model seems to have some shortcomings in this task. Here are some examples:
1、

I asked the model to translate the text in the image, but the model only did OCR and did not provide a translation result.
2、

I asked the model to translate the small words under the main title of the book on the left. The model did not have instructions, but answered, "This book is an introductory book on Python machine learning, including deep learning, neural networks, reinforcement learning, natural language processing, computer vision, etc"
In addition, the model also has issues such as translation errors and outputting a large amount of repetitive text.
I’m curious to know if any of these points resonate with your experience. Any perspective or analysis you could offer would be greatly valued.

hanxiao

Jina AI org 18 days ago

i think prompt like describe the image in {language} works better

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment