I’d like something to describe images for me and also recognise any text contained in them. I’ve tried llama3. 2-vision, llava and minicpm-v but they all get the text recognition laughably wrong.
Or maybe I should lay my image recognition dreams to rest with my measly 8 GB RAM card.
Edit: gemma3:4b is even worse than the others. It doesn’t even find the text and hallucinates others.
You must log in or register to comment.