Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

simonw

I'm absolutely amazed at how capable the new 1B model is, considering it's just a 1.3GB download (for the Ollama GGUF version).

I tried running a full codebase through it (since it can handle 128,000 tokens) and asking it to summarize the code - it did a surprisingly decent job, incomplete but still unbelievable for a model that tiny: https://gist.github.com/simonw/64c5f5b111fe473999144932bef42...

More of my notes here: https://simonwillison.net/2024/Sep/25/llama-32/

I've been trying out the larger image models to using the versions hosted on https://lmarena.ai/ - navigate to "Direct Chat" and you can select them from the dropdown and upload images to run prompts.

opdahl

I'm blown away with just how open the Llama team at Meta is. It is nice to see that they are not only giving access to the models, but they at the same time are open about how they built them. I don't know how the future is going to go in the terms of models, but I sure am grateful that Meta has taken this position, and are pushing more openness.

a_wild_dandan

"The Llama jumped over the ______!" (Fence? River? Wall? Synagogue?)

With 1-hot encoding, the answer is "wall", with 100% probability. Oh, you gave plausibility to "fence" too? WRONG! ENJOY MORE PENALTY, SCRUB!

I believe this unforgiving dynamic is why model distillation works well. The original teacher model had to learn via the "hot or cold" game on text answers. But when the child instead imitates the teacher's predictions, it learns semantically rich answers. That strikes me as vastly more compute-efficient. So to me, it makes sense why these Llama 3.2 edge models punch so far above their weight(s). But it still blows my mind thinking how far models have advanced from a year or two ago. Kudos to Meta for these releases.

alanzhuly

Llama3.2 3B feels a lot better than other models with same size (e.g. Gemma2, Phi3.5-mini models).

For anyone looking for a simple way to test Llama3.2 3B locally with UI, Install nexa-sdk(https://github.com/NexaAI/nexa-sdk) and type in terminal:

nexa run llama3.2 --streamlit

Disclaimer: I am from Nexa AI and nexa-sdk is an open-sourced. We'd love your feedback.

freedomben

If anyone else is looking for the bigger models on ollama and wondering where they are, the Ollama blog post answered that for me. The are "coming soon" so they just aren't ready quite yet[1]. I was a little worried when I couldn't find them but sounds like we just need to be patient.

[1]: https://ollama.com/blog/llama3.2

moffkalast

I've just tested the 1B and 3B at Q8, some interesting bits:

- The 1B is extremely coherent (feels something like maybe Mistral 7B at 4 bits), and with flash attention and 4 bit KV cache it only uses about 4.2 GB of VRAM for 128k context

- A Pi 5 runs the 1B at 8.4 tok/s, haven't tested the 3B yet but it might need a lower quant to fit it and with 9T training tokens it'll probably degrade pretty badly

- The 3B is a certified Gemma-2-2B killer

Given that llama.cpp doesn't support any multimodality (they removed the old implementation), it might be a while before the 11B and 90B become runnable. Doesn't seem like they outperform Qwen-2-VL at vision benchmarks though.

dhbradshaw

Tried out 3B on ollama, asking questions in optics, bio, and rust.

It's super fast with a lot of knowledge, a large context and great understanding. Really impressive model.

kingkongjaffa

llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on my macbookpro M1. It's faster and the results are better. It answered a few riddles and thought experiments better despite being 3b vs 8b.

I just removed my install of 3.1-8b.

my ollama list is currently:

$ ollama list

NAME ID SIZE MODIFIED

llama3.2:3b-instruct-q8_0 e410b836fe61 3.4 GB 2 hours ago

gemma2:9b-instruct-q4_1 5bfc4cf059e2 6.0 GB 3 days ago

phi3.5:3.8b-mini-instruct-q8_0 8b50e8e1e216 4.1 GB 3 days ago

mxbai-embed-large:latest 468836162de7 669 MB 3 months ago

kgeist

Tried the 1B model with the "think step by step" prompt.

It gets "which is larger: 9.11 or 9.9?" right if it manages to mention that decimals need to be compared first in its step-by-step thinking. If it skips mentioning decimals, then it says 9.11 is larger.

It gets the strawberry question wrong even after enumerating all the letters correctly, probably because it can't properly count.

JohnHammersley

Ollama post: https://ollama.com/blog/llama3.2

arnaudsm

Is there an up-to-date leaderboard with multiple LLM benchmarks?

Livebench and Lmsys are weeks behind and sometimes refuse to add some major models. And press releases like this cherry pick their benchmarks and ignore better models like qwen2.5.

If it doesn't exist I'm willing to create it

gdiamos

Llama 3.2 includes a 1B parameter model. This should be 8x higher throughput for data pipelines. In our experience, smaller models are just fine for simple tasks like reading paragraphs from PDF documents.

getcrunk

Still no 14/30b parameter models since llama 2. Seriously killing real usability for power users/diy.

The 7/8B models are great for poc and moving to edge for minor use cases … but there’s a big and empty gap till 70b that most people can’t run.

The tin foil hat in me is saying this is the compromise the powers that be have agreed too. Basically being “open” but practically gimped for average joe techie. Basically arms control

Ey7NFZ3P0nzAe

Interesting that its scores are somewhat helow Pixtral 12B https://mistral.ai/news/pixtral-12b/

kombine

Are these models suitable for Code assistance - as an alternative to Cursor or Copilot?

l5870uoo9y

> These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.

Do they require GPU or can they be deployed on VPS with dedicated CPU?

gunalx

3b was pretty good at multimodal (Norwegian) still a lot of gibberish at times, and way more sensitive than 8b but more usable than Gemma 2 2b at multi modal, fine at my python list sorter with args standard question. But 90b vision just refuses all my actually useful tasks like helping recreate the images in html or do anything useful with the image data other than describing it. Have not gotten as stuck with 70b or openai before. Insane amount of refusals all the time.

resters

This is great! Does anyone know if the llama models are trained to do function calling like openAI models are? And/or are there any function calling training datasets?

chriskanan

The assessments of visual capability really need to be more robust. They are still using datasets like VQAv2, which while providing some insight, have many issues. There are many newer datasets that serve as much more robust tests and that are less prone to being affected by linguistic bias.

I'd like to see more head-to-head comparisons with community created multi-modal LLMs as done in these papers:

https://arxiv.org/abs/2408.05334

https://arxiv.org/abs/2408.03326

I look forward to reading the technical report, once its available. I couldn't find a link to one, yet.

sgt

Anyone on HN running models on their own local machines, like smaller Llama models or such? Or something else?

404mm

Can anyone recommend a webUI client for ollama?

josephernest

Can it run with llama-cpp-python? If so, where can we find and download the gguf files? Are they distributed directly by meta, or are they converted to gguf format by third parties?

xrd

I'm currently fighting with a fastapi python app deployed to render. It's interesting because I'm struggling to see how I encode the image and send it using curl. Their example sends directly from the browser and uses a data uri.

But, this is relevant because I'm curious how this new model allows image inputs. Do you paste a base64 image into the prompt?

It feels like these models can start not only providing the text generation backend, but start to replace the infrastructure for the API as well.

Can you input images without something in front of it like openwebui?

thimabi

Does anyone know how these models fare in terms of multilingual real-world usage? I’ve used previous iterations of llama models and they all seemed to be lacking in that regard.

aussieguy1234

When using meta.ai, its able to generate images as well as understand them. Has this also been open sourced or just a GPT4o style ability to see images?

desireco42

I have to say that running this model locally I was pleasantly suprised how well it ran, it doesn't use as much resources and produce decent output, comparable to ChatGPT, it is not quite as OpenAI but for a lot of tasks, since it doesn't burden the computer, it can be used with local model.

Next I want to try to use Aider with it and see how this would work.

notpublic

Llama-3.2-11B-Vision-Instruct does an excellent job extracting/answering questions from screenshots. It is even able to answer questions based on information buried inside a flowchart. How is this even possible??

GaggiX

The 90B seem to perform pretty weak on visual tasks compare to Qwen2-VL-72B: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct, or am I missing something?

sk11001

Can one of thse models be run on a single machine? What specs do you need?

bombi

Is Termux enough to run the 1B model on Android?

oulipo

Can the 3B run on a M1 macbook? It seems that it hogs all the memory. The 1B runs fine

troupo

It's hallucinating so badly, it's kinda hilarious

Literally everything about the quote below is wrong. The problem with these "amazing ground-breaking models" is that you never know which knowledge they actually possess.

--- start quote ---

>>> what is the last version of Ash Framework for Elixir you know My training data goes up to December 2023, and according to the Ash Framework releases, the last stable version that I'm familiar with is:

* *Ash 2.14*: Released on April 8, 2021.

However, it's worth noting that Ash has been merged into Elixir itself, and most of its features have been integrated into the Elixir language and standard libraries.

In fact, Ash was one of the first frameworks to be included in Elixir, and it played a significant role in shaping the language. Many of Ash's core concepts, such as `defstruct/1` and `defmodule/1`, were later adopted by Elixir itself.

...

Keep in mind that Ash is no longer actively maintained as a separate framework, but its legacy continues to influence Elixir and its community.

--- end quote ---

bottlepalm

What mobile devices can the smaller models run on? iPhone, Android?

ofermend

Great release. Models just added to Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard.

TL;DR: * 90B-Vision: 4.3% hallucination rate * 11B-Vision: 5.5% hallucination rate

gdiamos

Do inference frameworks like vllm support vision?

stogot

Surprised no mention of audio?

dharma1

are these better than qwen at codegen?

taytus

meta.ai still running on 3.1

84adam

excited for this

sva_

Curious about the multimodal model's architecture. But alas, when I try to request access

> Llama 3.2 Multimodal is not available in your region.

It sounds like they input the continuous output of an image encoder into a transformer, similar to transfusion[0]? Does someone know where to find more details?

Edit:

> Regarding the licensing terms, Llama 3.2 comes with a very similar license to Llama 3.1, with one key difference in the acceptable use policy: any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2. [1]

What a bummer.

0. https://www.arxiv.org/abs/2408.11039

1. https://huggingface.co/blog/llama32#llama-32-license-changes...

minimaxir

Off topic/meta, but the Llama 3.2 news topic received many, many HN submissions and upvotes but never made it to the front page: the fact that it's on the front page now indicates that moderators intervened to rescue it: https://news.ycombinator.com/from?site=meta.com (showdead on)

If there's an algorithmic penalty against the news for whatever reason, that may be a flaw in the HN ranking algorithm.

monkfish328

Zuckerberg has never liked having Android/iOs as gatekeepers i.e. "platforms" for his apps.

He's hoping to control AI as the next platform through which users interact with apps. Free AI is then fine if the surplus value created by not having a gatekeeper to his apps exceeds the cost of the free AI.

That's the strategy. No values here - just strategy folks.

nmwnmw

- Llama 3.2 introduces small vision LLMs (11B and 90B parameters) and lightweight text-only models (1B and 3B) for edge/mobile devices, with the smaller models supporting 128K token context.

- The 11B and 90B vision models are competitive with leading closed models like Claude 3 Haiku on image understanding tasks, while being open and customizable.

- Llama 3.2 comes with official Llama Stack distributions to simplify deployment across environments (cloud, on-prem, edge), including support for RAG and safety features.

- The lightweight 1B and 3B models are optimized for on-device use cases like summarization and instruction following.

TheAceOfHearts

I still can't access the hosted model at meta.ai from Puerto Rico, despite us being U.S. citizens. I don't know what Meta has against us.

Could someone try giving the 90b model this word search problem [0] and tell me how it performs? So far with every model I've tried, none has ever managed to find a single word correctly.

[0] https://imgur.com/i9Ps1v6

alexcpn

In KungfuPanda there is this line that the Panda says "I love KungFuuuuuuuu", well I normally don't tell like this, but when I saw this and (starting to use this), I feel like yelling"I like Metaaaaa or is it LLAMMMAAA or is it Open source.. or is it this cool ecosystem which gives such value for free...

404mm

Newbie question, what size model would be needed to have a 10x software engineer skills and no knowledge of the human kind (ie, no need to know how to make a pizza or sequence your DNA). Is there such a model?