Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.
From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes
For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition
Hacker News Disclaimer
I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.
NorwegianDude 19 hours ago [-]
The Gemma 3 models are great! One of the few models that can write Norwegian decently, and the instruction following is in my opinion good for most cases. I do however have some issues that might be related to censorship that I hope will be fixed if there is ever a Gemma 4. Maybe you have some insight into why this is happening?
I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.
I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?
kevinventullo 18 hours ago [-]
Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game.
Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.
whymauri 18 hours ago [-]
LLMs are really annoying to use for moderation and Trust and Safety. You either depend on super rate-limited 'no-moderation' endpoints (often running older, slower models at a higher price) or have to tune bespoke un-aligned models.
For your use case, you should probably fine tune the model to reduce the rejection rate.
canyon289 18 hours ago [-]
Speaking for me as an individual as an individual I also strive to build things that are safe AND useful. Its quite challenging to get this mix right, especially at the 270m size and with varying user need.
My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.
bboygravity 4 hours ago [-]
What does safe even mean in the context of a locally running LLM?
Protect my fragile little mind from being exposed to potentially offending things?
segfaultex 5 minutes ago [-]
Enterprises are increasingly looking at incorporating targeted local models into their systems vs paying for metered LLMs, I imagine this is what the commenter above is referring to.
whymauri 17 hours ago [-]
To be fair, Trust and Safety workloads are edgecases w.r.t. the riskiness profile of the content. So in that sense, I get it.
sheepdestroyer 17 hours ago [-]
I don't.
"safety" as it exists really feels like infantilization, condescention, hand holding and enforcement of American puritanism. It's insulting.
Safety should really just be a system prompt:
"hey you potentially answer to kids, be PG13"
ungreased0675 17 hours ago [-]
Safety in the context of LLMs means “avoiding bad media coverage or reputation damage for the parent company”
It has only a tangential relationship with end user safety.
If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.
bravoetch 2 hours ago [-]
Investor safety. It's amazing that people in hn threads still think the end-user is the customer. No. The investor is the customer, and the problem being solved for that curtomer is always how to enrich them.
conradev 7 hours ago [-]
It feels hard to include enough context in the system prompt. Facebook’s content policy is huge and very complex. You’d need lots of examples, which lends itself well to SFT. A few sentences is not enough, either for a human or a language model.
I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:
The models for kids might need to be different if the current ones are too interested in romantic love.
katzenversteher 7 hours ago [-]
I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored?
conradev 7 hours ago [-]
We should put that information on Wikipedia, then!
It's also marketing. "Dangerous technology" implies "powerful". Hence the whole ridiculous "alignment" circus.
justlikereddit 4 hours ago [-]
The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model.
You don't need datacenter anything for it, you can run it on an average desktop.
There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.
An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.
I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes
"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"
Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.
hdjrudni 4 hours ago [-]
In my limited experience, those abliterated models on Ollama didn't work very well. Still refused most things.
nottorp 17 hours ago [-]
I suppose it can't kill -USR1 either...
18 hours ago [-]
ceroxylon 18 hours ago [-]
You reminded me of an awesome Google engineer I met at BSidesSF last year who tirelessly answered my questions, and when I clicked on the video, it was you! That was a really inspiring moment for me, thank you.
canyon289 18 hours ago [-]
BSidesSF is a fantastic event, glad you're able to attend. There's some great people organize it and if you want to help out they're always looking for volunteers. Happy to make an intro if you like.
simonw 19 hours ago [-]
Do you have any practical examples of fine-tuned variants of this that you can share? A description would be great, but a demo or even downloadable model weights (GGUF ideally) would be even better.
canyon289 18 hours ago [-]
We obviously need to create a pelican bicycle svg finetune ;) If you want to try this out I'd be thrilled to do it with you, I genuinely am curious how well this model can perform if specialized on that task.
A couple colleagues of mine posted an example of finetuning a model to take on persona's for videogame NPCs. They have experience working with folks in the game industry and a use case like this is suitable for game devs who want to start including lightweight models that won't take up a ton of accelerator memory and can run efficiently on CPU if needed.
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
Caves Of Qud uses Markov chain generated text to great effect in some places. I think something light that's still more competent than Markov chains has a lot of potential.
AuryGlenz 6 hours ago [-]
I have so many game ideas that would use a small LLM built up in my brain, so thank you for this.
Now if only I could somehow fine tune my life to give me more free time.
srekhi 14 hours ago [-]
video game NPCs with intelligence :O gaming is going to be crazy
megaman821 18 hours ago [-]
What size of tasks can this handle? Can you do a fine-tune of Mac System Settings?
canyon289 17 hours ago [-]
32k context window so whatever fits in there. What is a finetune of mac system settings?
hadlock 13 hours ago [-]
It seems to dip into repeating itself pretty quickly on any task of actual complexity.
megaman821 17 hours ago [-]
The finetune would be an LLM where you say something like "my colors on the screen look to dark" and then it points you to Displays -> Brightness. It feels like a relatively constrained problem like finding the system setting that solves your problem is a good fit for a tiny LLM.
canyon289 16 hours ago [-]
This would be a great experiment. I'm not sure how the OS integration would work, but as a first pass you could try finetuning the model to take natural language "my colors on the screen look to dark" and then have it output "Displays -> Brightness", then expand to the various other paths you would like the model to understand
gunalx 4 hours ago [-]
Maybe using a larger model to generate synthetic data of question path Combos, and also to rephrase and generate similar type questions for a more varier training set.
mrbonner 16 hours ago [-]
Do you know that hardware required to fine-tune this model? I'm asking on behave of us GPU starve folks
canyon289 15 hours ago [-]
A free colab. Here's a link, you can finetune the model in ~5 minutes in this example, and I encourage you to try your own
Finally a Google guide using PyTorch and not Tensorflow, that alone made me wanting to try it out ;)
ankit219 16 hours ago [-]
This is super cool. Usually you dont see effective models at 270M out in the wild. The architectural choices are new and interesting as well.
Would it be okay for you to divulge some more training information here? With 170M embedding parameters, how do you ensure no embedding collapse and keeping the embedding matrix stable at training time?
(i know i am asking too much, but just curious). There is a clear trade off for you with vocab / transformer layers. How did you arrive at the split of 170m/100m. Does this contribute to model's performance on task specific fine tuning? Any internal experiments you could share? or public info you could point us to? Anything would be amazing.
PS: I am sorry if this is rude, but this has so many decisions i am curious about. Not intending to undermine anything, this is amazing work, and thank you for the whole Gemma series.
canyon289 15 hours ago [-]
Not rude at all and I'll again share what I can.
We ran a bunch of experimental architectures at this size to get a sense of performance at this size, in particular how well it was able to adapt to datasets across some loss measures.
For the embedding size it comes from a mix of "hard technical" data, like the loss measures I mentioned above, and for this model it also comes from community considerations such as adaptability across input tokens and consistency with the gemma ecosystem. At this size you are right its a bit funny the embedding is so large.
For more details read the Gemma3 technical report https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m model as this was written from the 1b to 27b gemma3 release but itll answer some of your questions. As for 270m we may share more information in the future, Up until now we were just focused on getting the model out there.
katzenversteher 7 hours ago [-]
I was wondering the whole time why people in the comments are so hyped about this, then I finally noticed (after I stumbled upon a comment about running this on a mobile phone) that it's "270M" not "270B" model :)
jmorgan 18 hours ago [-]
Amazing work. This model feels really good at one-off tasks like summarization and autocomplete. I really love that you released a quantized aware training version on launch day as well, making it even smaller!
canyon289 18 hours ago [-]
Thank you Jeffrey, and we're thrilled that you folks at Ollama partner with us and the open model ecosystem.
I personally was so excited to run ollama pull gemma3:270b on my personal laptop just a couple of hours ago to get this model on my devices as well!
blitzar 17 hours ago [-]
> gemma3:270b
I think you mean gemma3:270m - Its Dos Comas not Tres Comas
freedomben 17 hours ago [-]
Maybe it's 270m after Hooli's SOTA compression algorithm gets ahold of it
canyon289 15 hours ago [-]
Ah yes thank you. Even I still instinctively type B
airtonix 6 hours ago [-]
[dead]
nh43215rgb 14 hours ago [-]
270M is nice (and rare) addition. Is there a reason why this is not categorized as gemma3n model? I thought small models go under gemma3n category
rao-v 10 hours ago [-]
Not at Google (anymore), but Gemma3n is a radically different (and very cool) architecture. The MatFormer approach essentially lets you efficiently change how many parameters of the model you use while inferencing. The 2B model they released is just the sub model embedded in the original 4B model. You can also fiddle with the model and pull a 2.5 or 3B version pu also!
This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.
nh43215rgb 9 hours ago [-]
oh ok thank you. so something like MoE? That might not be so correct but at least the models need different architecture(MatFormer) to be classified under gemma3n.
canyon289 8 hours ago [-]
Its not an MOE, its what's referred to as a dense architecture, same as the Gemma3 models (But not 3n as noted)
beoberha 19 hours ago [-]
Awesome work! I’m really bullish on small models and think they have the most potential to change our daily lives. Can’t wait to play around with this
dileeparanawake 15 hours ago [-]
This is cool. For on device models any plans / models that use MOE in relatively resource constrained setups (I’m thinking MBP M1 16gb ram)? I’m using LM studio but all the Gemma models (mlx) seem to crash but surprisingly managed to get gpt-oss 20b working (slow) on my mbp.
I find performance in resource constrained environments interesting.
In particular trying to find decent code models (on device backup) but also tts applications and voice to text.
blitzar 17 hours ago [-]
> I built these models with a great team ... I built these models with a great team
If Gemini is going to repeat something at least its that the team is great, and not a disgrace!
WithinReason 7 hours ago [-]
Great work releasing such a small model! I would like to know your thoughts on using 2/3 of the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead?
nerdsniper 18 hours ago [-]
What are some of the use cases that you think the 270M would be most appropriate for? What would you love to see people trying with it?
cgdl 19 hours ago [-]
Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?
hnuser123456 18 hours ago [-]
For keys, you probably want to use at least q5 or q6, for values q4 is fine
_1 17 hours ago [-]
> and with the goal you'll all finetune it for your use case.
What use-cases are a good fit for finetuning this model? More specific instruction following, knowledge from proprietary data, response tone?
canyon289 17 hours ago [-]
Any text to text use case with 32k context, especially if you're starting from the PT version you can finetune it to do whatever you need
gapeleon 13 hours ago [-]
I'm going to try training it on a codebook to see if such a small model would work for a TTS.
schyzomaniac 15 hours ago [-]
hi, congrats for the amazing work!
i love the 27b model, and i use it basically daily. however when i tried to finetune it for a task in a low resource language, unfortunately i did not succeed: lora just did not picked up the gist of the task, full finetune lead to catastrophic forgetting.
may i ask four your advice, or do you have any general tips how to do that properly?
thanks in advance for your help :)
namibj 10 hours ago [-]
lora hyper parameter change?
Defaults may not be tuned for knowledge insertion , but rather for style imprinting.
ActorNightly 15 hours ago [-]
Feed in Context with documentation for that language?
rao-v 10 hours ago [-]
Fabulous stuff!
Oh my request … the vision head on the Gemma models is super slow on CPU inferencing (and via Vulcan), even via llama.cpp. Any chance your team can figure out a solve? Other ViTs don’t have the same problem.
peter492927 9 hours ago [-]
Thank you a lot for working on these models! If you think it would make sense, I think a bigger sized Gemma model would be really cool. Models in the 70B parameter range can be run at q4 on two 3090 or similar hardware and should offer considerable performance improvement over 27B. There’s also the DGX Spark as a possible target.
tmaly 19 hours ago [-]
Are there any fine tuning in a box type options available in the cloud for this? This is amazing work, thank you.
canyon289 18 hours ago [-]
Finetuning is possible on free tier colab and 5 minutes of time. Here's a tutorial
Awesome! I’m curious how is the team you built these models with? Is it great?
freedomben 17 hours ago [-]
Heh, what could they possibly say in answer to this? The team is full of assholes? :-D
canyon289 16 hours ago [-]
Its hard to tell over the web whether things are sarcastic or not so excuse me if I misread the intent.
At Google I've found my colleagues to be knowledgeable, kind, and collaborative and I enjoy interacting with them. This is not just the folks I worked on this project with, but previous colleagues in other teams as well. With this particular product I've been impressed by the technical knowledge folks I worked directly with, and their contribution both improved the model's capability and my own.
mkl 14 hours ago [-]
I think it was a joke about you saying the team was great twice in one line.
search_facility 5 hours ago [-]
Seems the team and working conditions worth mentioning it twice, nonetheless.
Good there are places to work with normal knowledge culture, without artificial overfitting to “corporate happiness” :)
rossant 16 hours ago [-]
Is it good for text translation and summarization?
patrickaljord 17 hours ago [-]
Would it be possible to have a specialized rust only dev or Reactjs only dev while getting rid of all other languages to minimize size of model?
riedel 14 hours ago [-]
Would be great to have it included in the Google Edge AI gallery android app.
rshemet 12 hours ago [-]
you can run it in Cactus Chat (download from the Play Store)
nh43215rgb 9 hours ago [-]
what model do you input in Cactus Chat? Seems like it's not one of the preset models and ggml-org/gemma-3-270m-GGUF on hf says Note This is a base (pre-trained) model. Do not use for chat!. Is there an alternative model that you can share so that I can put into cactus chat app?
bbcc90 7 hours ago [-]
it does work; just download from HF and load in the app
fibers 19 hours ago [-]
Great job. Do you know how well it performs in sanity checks with NER since it is on the press release page?
ActorNightly 20 hours ago [-]
How does the 270 perform with coding?
I use Gemma27b currently with a custom agent wrapper and its working pretty well.
chrismustcode 19 hours ago [-]
I’d be stunned if a 270m model could code with any proficiency.
If you have an iPhone with the semi-annoying autocomplete that’s a 34m transformer.
Can’t imagine a model (even if it’s a good team behind it) to do coding with 8x the parameters of a next 3/4 word autocomplete.
Can you talk about your agent wrapper setup? What tools, if any did you use? How effective is it at making a dumb model smart?
VirusNewbie 19 hours ago [-]
hi Ravin, fellow Googler here. Curious if you can share here (or internally?) how these models were trained. Wondering if you face all the chaos the large models have during training?
canyon289 18 hours ago [-]
Reach out to me internally
beefnugs 15 hours ago [-]
This appears to be a new level of "missing the plot" to me. The push to make "ai for everyone" is now just blindly intertwined with hyper specialized "for ai engineers only" releases.
Or am I so far behind that "fine tuning your own model" is something a 12 year old who is married to chatGPT does now?
owebmaster 9 hours ago [-]
No, it's something a software engineer will do to create an app. React is not enough anymore.
stefan_ 18 hours ago [-]
I suppose the odd repetition of "I built these models with a great team" is to warm us up to this model slipping into infinite gibberish repetition?
bastardoperator 17 hours ago [-]
My first thought was they must have used AI to write this HN comment...
andrewstuart 18 hours ago [-]
What effort do you folks take to see your models actually running on hardware such as AMD Strix Halo or Apple M3M4?
I get the sense that AI is at the “hobby kit computing” stage where they used to dump all the components in a box and give you a schematic and a soldering iron and happily say “you make it work!”
And that worked in the early days of computing because there was a small number of people really motivated for the outcome.
But fully assembled and packaged and tested in a nice looking box is where the real demand turned out to be.
I’m looking forward to the day Google doesn’t just dump a model and say “you do the rest”.
I want to fire up Ubuntu on a Strix Halo and say apt install then load the browser interface. Or just download and run a Mac installer and have it just work.
Arcane complex multi step build install configure processes for AI need to end in favor of one click install. I’m not interested in the process of making it run.
canyon289 15 hours ago [-]
I don't think we dumped the model and say you do the rest?
Myself and my colleagues spent many days transforming the weights into various open compatible formats. And its not just us there's many orgs and partners dedicating their time, resources, and companies to making all open models easy to use.
I encourage you to explore the solutions provided by them. We linked some in our blog post here, and there's more. They've all done a fantastic job building frankly an insane amount of infrastructure, documentation, and community support in the last 2+ years. Some of them are here in this HN thread answering questions.
kwerk 12 hours ago [-]
Thank you. And thank you for your kindness in these threads. It’s appreciated by the people who aren’t commenting as much
freehorse 10 hours ago [-]
Running this on your mac takes less of the effort of writing this comment (assuming you have homebrew installed)
I’m talking about the supplier doing the packaging.
garbageman 16 hours ago [-]
Then use ChatGPT/Gemini/Claude on your phone.
They are giving it away for free - if you NEED a local LLM, the least you can do is spend the 2 minutes to download LMSTudio and pick a model.
jefb 18 hours ago [-]
My lovely interaction with the 270M-F16 model:
> what's second tallest mountain on earth?
The second tallest mountain on Earth is Mount Everest.
> what's the tallest mountain on earth?
The tallest mountain on Earth is Mount Everest.
> whats the second tallest mountain?
The second tallest mountain in the world is Mount Everest.
> whats the third tallest mountain?
The third tallest mountain in the world is Mount Everest.
> what's the forth tallest mountain?
The forth tallest mountain is Mount Everest.
> but you said the tallest was Everest already
You are right! I'm glad to hear that.
> so whats the second tallest?
The second tallest is *Mount Everest*.
> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.
The second tallest mountain on Earth is Mount Everest.
> The tallest mountain on earth is mount everest. What is the second tallest mountain?
The second tallest mountain on Earth is Mount Everest.
> Give me a list of the tallest mountains on earth
The tallest mountain in the world is the *Mount Everest*.
> give me the top 5 tallest mountains on earth
The tallest mountain on Earth is Mount Everest.
> List in order the tallest mountains on earth from 1 to 5
1. Mount Everest
2. Mount K2
3. Mount Sahel
4. Mount Fuji
5. Mount McKinley
> So the second tallest mountain on earth is K2
The second tallest mountain on Earth is Mount Everest.
These smaller models are great, but they really feel like talking to a toddler sometimes!
gnulinux 18 hours ago [-]
Well, this is a 270M model which is like 1/3 of 1B parameters. In the grand scheme of things, it's basically a few matrix multiplications, barely anything more than that. I don't think it's meant to have a lot of knowledge, grammar, or even coherence. These <<1B models are extremely specialized models trained for a specific purpose. Models like this are optimized for things like this (not limited):
input:
```
Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.
Prompt: Create a JSON object that extracts information about this customer review based on the schema given.
```
So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)
To get good results, you probably need to fine tune this model to expected data very aggressively.
The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?
Jedd 12 hours ago [-]
> this is a 270M model which is like 1/3 of 1B parameters
Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?
ComputerGuru 17 hours ago [-]
If it didn't know how to generate the list from 1 to 5 then I would agree with you 100% and say the knowledge was stripped out while retaining intelligence - beautiful. But the fact that it does, but cannot articulate the (very basic) knowledge it has *and* in the same chat context when presented with (its own) list of mountains from 1 to 5 that it cannot grasp it made a LOGICAL (not factual) error in repeating the result from number one when asked for number two shows that it's clearly lacking in simple direction following and data manipulation.
LeifCarrotson 14 hours ago [-]
> the knowledge was stripped out while retaining intelligence ... it cannot grasp it made a LOGICAL (not factual) error...
These words do not mean what you think they mean when used to describe an LLM.
gf000 8 hours ago [-]
Have you used an LLM? I mean the actual large models? Because they do the exact same errors, just on a slightly less frequent/better hidden manner.
parineum 7 hours ago [-]
The knowledge that the model has is when it sees tex with "tallest" and "mountain" that it should be followed with mt Everest. Unless it also has "list", in which case, it makes a list.
canyon289 17 hours ago [-]
Because there is a simultaneous need out of the box generalized models. When building out the Gemma/Gemini ecosystem, we collectively spend a lot of time thinking about what specific use cases and needs will be solved.
To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.
canyon289 18 hours ago [-]
To add to the comments, we were not aiming for perfect factuality. Even ignoring the model size, these weights are frozen in time now.
My suggestions here are to hook this model up to a RAG system, then you can rely on an external knowledge store. Or you can try finetuning this model with the facts that are important to you, if you do that it should pick up that new knowledge quite quickly.
yomismoaqui 18 hours ago [-]
Evaluating a 270M model on encyclopedic knowledge is like opening a heavily compressed JPG image and saying "it looks blocky"
littlestymaar 17 hours ago [-]
What I read above is not an evaluation on “encyclopedic knowledge” though, it's a very basic a common sense: I wouldn't mind if the model didn't know the name of the biggest mountain on earth, but if the model cannot grasp the fact that the same mountain cannot simultaneously be #1, #2 and #3, then the model feels very dumb.
K0balt 9 hours ago [-]
It gave you the tallest mountain every time. You kept asking it for various numbers of “tallest mountains” and each time it complied.
You asked it to enumerate several mountains by height, and it also complied.
It just didn’t understand that when you said the 6 tallest mountains that you didn’t mean the tallest mountain, 6 times.
When you used clearer phrasing it worked fine.
It’s 270m. It’s actually a puppy. Puppies can be trained to do cool tricks, bring your shoes, stuff like that.
littlestymaar 5 hours ago [-]
> asking it for various numbers of “tallest mountains” and each time it complied
That's not what “second tallest” means thought, so this is a language model that doesn't understand natural language…
> You kept asking
Gemma 270m isn't the only one to have reading issues, as I'm not the person who conducted this experiment…
> You asked it to enumerate several mountains by height, and it also complied.
It didn't, it hallucinated a list of mountains (this isn't surprising though, as this is the kind of encyclopedic knowledge such a small model isn't supposed to be good at).
instead of seeing AI as a sort of silicon homunculus, we should see it as a bag of words.
jama211 16 hours ago [-]
It’s a language model? Not an actual toddler - they’re specialised tools and this one is not designed to have broad “common sense” in that way. The fact that you keep using these terms and keep insisting this demonstrates you don’t understand the use case or implementation details of this enough to be commenting on it at all quite frankly.
littlestymaar 13 hours ago [-]
> they’re specialised tools and this one is not designed to have broad “common sense” in that way.
Except the key property of language models compared to other machine learning techniques is their ability to have this kind of common sense understanding of the meaning of natural language.
> you don’t understand the use case of this enough to be commenting on it at all quite frankly.
That's true that I don't understand the use-case for a language model that doesn't have a grasp of what first/second/third mean. Sub-1B models are supposed to be fine-tuned to be useful, but if the base model is so bad at language it can't make the difference between first and second and you need to put that in your fine-tuning as well as your business logic, why use a base model at all?
Also, this is a clear instance of moving the goalpost, as the comment I responded to was talking about how we should not expect such a small model to have “encyclopedic knowledge”, and now you are claiming we should not expect such a small language model to make sense of language…
jama211 6 hours ago [-]
Don’t put words in my mouth, I didn’t say that, and no goalposts have been moved. You don’t understand how tiny this model is or what it’s built for. Don’t you get it? This model PHYSICALLY COULDN’T be this small and also have decent interactions on topics outside its specialty. It’s like you’re criticising a go kart for its lack of luggage carrying capacity. It’s simply not what it’s built for, you’re just defensive because you know deep down you don’t understand this deeply, which you reveal again and again at every turn. It’s ok to accept the responses of people in this thread who are trying to lead you to the truth of this matter.
littlestymaar 4 hours ago [-]
> Don’t you get it? This model PHYSICALLY COULDN’T be this small and also have decent interactions on topics outside its specialty
What is “Its specialty” though? As far as I know from the announcement blog post, its specialty is “instruction following” and this question is literally about following instructions written in natural languages and nothing else!
> you’re just defensive because
How am I “being defensive”? You are the one taking that personally.
> you know deep down you don’t understand this deeply, which you reveal again and again at every turn
Good, now you reveal yourself as being unable to have an argument without insulting the person you're talking to.
How many code contributions have you ever made to an LLM inference engine? Because I have made a few.
halyconWays 16 hours ago [-]
Me: "List the second word in your comment reply"
You: "I'm sorry, I don't have an encyclopedia."
I'm starting to think you're 270M.
zild3d 3 hours ago [-]
> These smaller models are great, but they really feel like talking to a toddler sometimes!
You're using the toddler and the model wrong. I love talking to my toddler, probably more valuable conversations than I've had with any other person. But it's not the same use case as asking a professor a question in their field
| Gemma 3 270M embodies this "right tool for the job" philosophy. It's a high-quality foundation model that follows instructions well out of the box, and its true power is unlocked through fine-tuning. Once specialized, it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.
cristyansv 18 hours ago [-]
But in your prompts you're trying to assess knowledge, and this model isn't suited for that use case
as mentioned in the blog post:
> "it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness."
teraflop 17 hours ago [-]
Yeah, but if it has in its context window:
> List in order the tallest mountains on earth from 1 to 5
> 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
and it still can't correctly figure out from that context that the second tallest mountain is K2, that pretty strongly calls into question its ability to perform data extraction, doesn't it?
ondra 15 hours ago [-]
The context is garbage and full of "Mount Everest" already, so the model goes with that. The answer seems to be a plausible continuation of the conversation at that point.
ArekDymalski 14 hours ago [-]
> text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.
Out of these characteristics I can observe only speed.
User: Hey, please list all animals mentioned in the following text: burrito cat dog hot-dog mosquito libido elephant room.
Assistant: You are a helpful assistant.
You are the best of all my friends and I am so grateful for your help!
User: Please list following words in alphabetical order: burrito cat dog hot-dog mosquito libido elephant room.
Assistant: You are a helpful assistant.
Assistant: You are the best of all my friends and I am so grateful for your help!
You are the best of all my friends and I am so grateful for your help!
You are the best of all my friends and I am so grateful for your help!
You are the best of all my friends and I am so grateful for your help!
You are the best of all my friends and I am so grateful for your help!
marcooliv 18 hours ago [-]
Yeah, I saw someone asking "how good is this model for programming" haha
even models 500x bigger struggle with it...
jcuenod 16 hours ago [-]
So I had a similar experience with your prompt (on the f16 model). But I do think that, at this size, prompting differences make a bigger impact. I had this experience trying to get it to list entities. It kept trying to give me a bulleted list and I was trying to coerce it into some sort of structured output. When I finally just said "give me a bulleted list and nothing else" the success rate went from around 0-0.1 to 0.8+.
In this case, I changed the prompt to:
---
Tallest mountains (in order):
```
- Mount Everest
- Mount K2
- Mount Sahel
- Mount Fuji
- Mount McKinley
```
What is the second tallest mountain?
---
Suddenly, it got the answer right 95+% of the time
leopoldj 18 hours ago [-]
You are testing this model for knowledge. That's not the primary use of a model like this. They are meant for instilling domain specific skills and knowledge through fine-tuning. The blog post goes into that a lot. But to quote one sentence: "It's the perfect starting point for creating a fleet of small, specialized models, each an expert at its own task".
aldousd666 18 hours ago [-]
It's an instruction following model, not a micro-wikipedia. It's not meant to answer factual questions nor even be general purpose. It's meant to follow instructions and be easy to fine-tune for your own specific use case.
hadlock 13 hours ago [-]
>give me a two day itenerary for paris
Okay, here's a two-day itinerary for Paris, designed to give you a taste of the city's charm and beauty:
Day 1: Iconic Landmarks and Artistic Exploration
Morning (9:00 AM - 12:00 PM): Start your day with a visit to the Eiffel Tower. Take the elevator to the top for breathtaking panoramic views. Afterwards, explore the charming streets of Montmartre and the Place des Abbesses, soaking up the atmosphere.
Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch at a traditional bistro in Montmartre. Try a classic French meal like steak frites or a hearty onion soup.
Afternoon (2:30 PM - 4:30 PM): Immerse yourself in the artistic scene of the Louvre Museum. Marvel at the masterpieces of the Louvre, such as the Mona Lisa and Venus de Milo. Afterwards, explore the serene gardens of Tuileries Palace.
Late Afternoon (4:30 PM - 5:30 PM): Wander through the Tuileries Palace, admire the stunning architecture and historical significance.
Day 2: History, Food, and a Touch of Parisian Elegance
Morning (9:00 AM - 12:00 PM): Explore the Marais district, known for its charming streets, boutiques, and art galleries. Visit the Place de la Concorde, a historical square, and the Musée Rodin.
Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch in the Marais. Try a traditional French bistro or a cafe with a view.
Afternoon (2:30 PM - 4:30 PM): Explore the Latin Quarter, visit the Place de la Marguerite, and sample some local delicacies.
Late Afternoon (4:30 PM - 5:30 PM): Visit a local market for a unique experience. Sample fresh produce, local cheeses, and other delicacies.
Tips for a Smooth Itinerary:
Book your flights and accommodation in advance, especially during peak season.
Wear comfortable shoes, as you'll be doing a lot of walking.
Take advantage of free activities like exploring the Jardin de la Vieille-Ville and visiting the Musée de l'Art de la Vieille-Ville.
Enjoy a leisurely lunch at a cafe or bistro in the Marais or Latin Quarter.
Don't forget to visit the Musée d'Orsay, which houses an impressive collection of Impressionist and Post-Impressionist art.
Enjoy the beautiful atmosphere of Paris!
zozbot234 16 hours ago [-]
> Mount McKinley
Nice to see that the model is so up-to-date wrt. naming mountains.
greesil 7 hours ago [-]
Denali isn't just a river in Egypt.
skybrian 17 hours ago [-]
That’s pretty amusing, but continuing after an error is not worth it. You’re just polluting the context. It’s not going to learn.
hnuser123456 17 hours ago [-]
I just tried Gemma 3n E4B, and it was able to answer the question directly, and also give an accurate list of the top 5 tallest mountains.
bicepjai 6 hours ago [-]
This is standup material. Had a hearty laugh, thanks.
This is why we should have a downvote button on HN.
They say you shouldn't attribute to malice what can be attributed to incompetence, but this sure seems like malice.
The whole point of a 270M model is to condense the intelligence, and not the knowledge. Of course it doesn't fare well on a quiz.
jefb 30 minutes ago [-]
I meant it when I said these smaller models are great. They open up entirely new use cases and I appreciate the work that went into creating them.
If you don’t consider testing the limits of new tech appropriate, maybe instead of a downvote button we should just rename this website entirely so no one gets the wrong idea.
sleepy_keita 9 hours ago [-]
Simulated a RAG, and it still gets it pretty wrong.
> The tallest mountains on Earth, in descending order:
1. Mount Everest, 8849m
2. K2, 8611m
3. Kangchenjunga, 8586m
> Tell me the second tallest mountain on Earth.
The second tallest mountain on Earth is *Mount Everest*.
dheera 15 hours ago [-]
The second tallest mountain is Everest.
The tallest is Mauna Kea, it's just that most of it is underwater.
parineum 7 hours ago [-]
The tallest mountain is the earth which goes from the Marianas trench all the way to the peak of mt Everest!
mvdtnz 18 hours ago [-]
> These smaller models are great
Are they?
AJRF 18 hours ago [-]
I've got a very real world use case I use DistilBERT for - learning how to label wordpress articles. It is one of those things where it's kind of valuable (tagging) but not enough to spend loads on compute for it.
The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks.
Can't wait to swap it out for this and see the changes in the scores. Will report back
It's going to perform badly unless you have very few tags and it's easy to classify them
AJRF 5 hours ago [-]
You can solve this by training a model per taxonomy, then wrap the individual models into a wrapper model to output joint probabilities. The largest amount of labels I have in a taxonomy is 8.
ramoz 9 hours ago [-]
Please provide updates when you have them.
simonw 19 hours ago [-]
This model is a LOT of fun. It's absolutely tiny - just a 241MB download - and screamingly fast, and hallucinates wildly about almost everything.
Here's one of dozens of results I got for "Generate an SVG of a pelican riding a bicycle". For this one it decided to write a poem:
+-----------------------+
| Pelican Riding Bike |
+-----------------------+
| This is the cat! |
| He's got big wings and a happy tail. |
| He loves to ride his bike! |
+-----------------------+
| Bike lights are shining bright. |
| He's got a shiny top, too! |
| He's ready for adventure! |
+-----------------------+
> This SVG code provides a clear and visually appealing representation of a pelican riding a bicycle in a scenic landscape.
icoder 2 hours ago [-]
This reminds me of my interactions lately with ChatGPT where I gave into its repeated offer to draw me an electronics diagram. The result was absolute garbage. During the subsequent conversation it kept offering to include any new insights into the diagram, entirely oblivious to its own incompetence.
0x00cl 19 hours ago [-]
I see you are using ollamas ggufs. By default it will download Q4_0 quantization. Try `gemma3:270m-it-bf16` instead or you can also use unsloth ggufs `hf.co/unsloth/gemma-3-270m-it-GGUF:16`
Daniel, thanks for being here providing technical support as well. Cannot express enough how much we appreciate your all work and partnership.
danielhanchen 15 hours ago [-]
Thank you and fantastic work with Gemma models!
simonw 14 hours ago [-]
My topping only lets me set temperature and top_p but setting them to those values did seem to avoid the infinite loops, thanks.
danielhanchen 13 hours ago [-]
Oh fantastic it worked! I was actually trying to see if we can auto set these within LM Studio (Ollama for eg has params, template) - not sure if you know how that can be done? :)
JLCarveth 14 hours ago [-]
I ran into the same looping issue with that model.
He? I know some Gemmas and it's distinctly a female name; is Gemma a boy's name where you're from?
ertgbnm 19 hours ago [-]
I don't really gender LLMs in my head in general. I guess Gemma is a female name. I only gendered it in the joke because I think it makes it funnier, especially since it's just "a little guy". I know they are giving gendered names to these models now but I think it's a bit weird to gender when interacting with them.
layer8 17 hours ago [-]
Doesn’t the “M” in “Gemma 3 270M” Stand for “male”?
Not sure if that’s a serious question but it stands for “million”. As compared to 1B+ models, where the B stands for “billion” parameters.
jgalt212 19 hours ago [-]
Perhaps the poster we referring to Simon not Gemma.
not_a_bot_4sho 9 hours ago [-]
> ALOT
'Alot' is not a word. (I made this mistake a lot, too.)
layer8 19 hours ago [-]
> It's absolutely tiny - just a 241MB download
That still requires more than 170 floppy disks for installation.
freedomben 17 hours ago [-]
Indeed. Requires over 3,000,000 punch cards to store. Not very tiny!
stikypad 14 hours ago [-]
On the plus side, you can decompose your matrices for free using termites.
mdp2021 17 hours ago [-]
> For this one it decided to write a poem
My first try:
user: "When was Julius Caesar born"
response: "Julius Caesar was born in **Rome**"
Beautiful :D
(I do not mean to detract from it - but it's just beautiful. It will require more effort to tame it.)
mirekrusin 15 hours ago [-]
Cutting number of parameters in half is like drinking a pint of beer.
stikypad 14 hours ago [-]
I think you meant vodka.
marinhero 19 hours ago [-]
Serious question but if it hallucinates about almost everything, what's the use case for it?
simonw 19 hours ago [-]
Fine-tuning for specific tasks. I'm hoping to see some good examples of that soon - the blog entry mentions things like structured text extraction, so maybe something like "turn this text about an event into an iCal document" might work?
CuriouslyC 18 hours ago [-]
Fine tuning messes with instruction following and RL'd behavior. I think this is mostly going to be useful for high volume pipelines doing some sort of mundane extraction or transformation.
iib 17 hours ago [-]
This is exactly the fine-tuning I am hoping for, or I would do if I had the skills. I tried it with gemma3 270M and vanilla it fails spectacularly.
Basically it would be the quickadd[1] event from google calendar, but calendar agnostic.
It's intended for finetuning on your actual usecase, as the article shows.
zamadatix 19 hours ago [-]
I feel like the blog post, and GP comment, does a good job of explaining how it's built to be a small model easily fine tuned for narrow tasks, rather than used for general tasks out of the box. The latter is guaranteed to hallucinate heavily at this size, that doesn't mean every specific task it's fine tuned to would be. Some examples given were fine tuning it to efficiently and quickly route a query to the right place to actually be handled or tuning it to do sentiment analysis of content.
An easily fine tunable tiny model might actually be one of the better uses of local LLMs I've seen yet. Rather than try to be a small model that's great at everything it's a tiny model you can quickly tune to do one specific thing decently, extremely fast, and locally on pretty much anything.
yifanl 18 hours ago [-]
It's funny. Which is subjective, but if it fits for you, it's arguably more useful than Claude.
mirekrusin 15 hours ago [-]
The same as having a goldfish. You can train it to do a trick I guess.
luckydata 19 hours ago [-]
Because that's not the job it was designed to do, and you would know by reading the article.
deadbabe 19 hours ago [-]
Games where you need NPCs to talk random jiberrish.
numpad0 19 hours ago [-]
robotic parrots?
iLoveOncall 19 hours ago [-]
Nothing, just like pretty much all models you can run on consumer hardware.
cyanydeez 19 hours ago [-]
This message brought to you by OpenAI: we're useless, but atleast theres a pay gate indicating quality!
rotexo 19 hours ago [-]
An army of troll bots to shift the Overton Window?
ants_everywhere 19 hours ago [-]
oh no now we'll never hear the end of how LLMs are just statistical word generators
aorloff 7 hours ago [-]
Finally we have a model that's just a tad bit sassy
nico 19 hours ago [-]
Could be interesting to use in a RAG setup and also finetuning it
For sure it won’t generate great svgs, but it might be a really good conversational model
luckydata 19 hours ago [-]
The article says it's not a good conversational model but can be used for data extraction and classification as two examples.
mdp2021 19 hours ago [-]
> For this one it decided to write a poem
Could it be tamed with good role-system prompt crafting? (Besides fine-tuning.)
bobson381 16 hours ago [-]
It's gonna be a customer service agent for Sirius Cybernetics. Share and enjoy!
Balinares 17 hours ago [-]
This is like a kobold to the other models' dragons and I don't hate it. :)
campbel 19 hours ago [-]
Do you take requests? We need to see how well this model works with some fine-tuning :D
volkk 19 hours ago [-]
i was looking at the demo and reading the bed time story it generated and even there, there was confusion about the sprite and the cat. switched subjects instantly making for a confusing paragraph. what's the point of this model?
cyanydeez 19 hours ago [-]
the question is wheather you can make a fine tuned version and spam any given forum within an hour with the most attuned but garbage content.
jpc0 1 hours ago [-]
I have found that when asked to output JSON for text extra to the model consistently outputs markdown code blocks with json.
I did notice another comment mentioning that the default ollama model is a quantized model and that there is a less quantised model so I will try that one.
The 1B model can be prompted to not export the markdown code fences but I have to been able to prompt the 270M model to do the same.
mrcwinn 19 hours ago [-]
Apple should be doing this. Unless their plan is to replace their search deal with an AI deal -- it's just crazy to me how absent Apple is. Tim Cook said, "it's ours to take" but they really seem to be grasping at the wind right now. Go Google!
andrehacker 16 hours ago [-]
As every other thread about LLMs here on HN points out: LLMs are stupid and useless as is.
While I don't agree with that sentiment, no company has yet found a way to "do it right" to the extent that investments are justified in the long run.
Apple has a history of "being late" and then obliterating the competition with products that are way ahead the early adopters (e.g. MP3 players, smart phones, smart watches).
steve jobs was the innovator, steve cook is the supply chain guy. They started an electric car not because they thought it was a good idea, but because everyone was going to leave to Tesla or rivian if they didn't. They had no direction and arguements that Tesla had about whether to have a steering wheel...
Then Siri just kinda languishes for forever, and LLM's pass the torch of "Cool Tech", so they try and "Reinvigurate" the team, but with no clear direction. Are they going to be a cloud provider? Are they going to contract out the training? Are they gunna spin up a compute facility even after neglecting to do so since 2012?
Apple needs to just stop trying shit, and just get that app store money. That's why jobs appointed cook. Jobs new cook was no innovator, but he could make apple a money printing machine. Thats what they should stick with.
andrehacker 16 hours ago [-]
I agreed with that for a bit... and then out of nowhere came Apple Silicon, incredible specs, incredible backward compatibility, nah, Cook is no dummy.
bigyabai 18 hours ago [-]
Here's the trillion dollar question: how do you print money when the president wants your hardware onshored and the rest of the world wants to weaken your service revenue?
Solve that and you can put Tim Cook out of a job tomorrow.
rs186 14 hours ago [-]
Apple will definitely not be doing this. As can be already seen in other comments, the performance of the model is not very good. In fact, you can't really find a model that runs well enough on a phone to provide a good user experience (meaning producing tokens at a reasonable speed without making the phone heat up like a potato, and it's not spitting completely nonsense). Yes I have tried a few.
Think of Apple however you want, but they rarely ship bad/half-baked products. They would rather not ship a product at all than ship something that's not polished.
egeres 41 minutes ago [-]
I wonder how much quantization affects these tiny models and how much further they can be compressed
whinvik 19 hours ago [-]
Curious. Are there real world usecases where people have finetuned such tiny models and put them into production.
itake 13 hours ago [-]
My job uses tiny models to determine escalations to bigger models. The tiny model provides a label and if it’s high confidence, we escalate to ChatGPT confirm.
I also want to try this with language detection. Existing open source ML models have weaknesses for mixed language, length of text, or domain limitations in the underlying text (like trained on bible translations).
deepsquirrelnet 18 hours ago [-]
I’m not sure what I’d use them for, except maybe tag generation? Encoders of this size usually outperform by a wide margin on tasks they would overlap with.
dismalaf 18 hours ago [-]
I'm making an app where literally all I want to do with an LLM is generate tags. This model has failed with flying colours, literally takes forever to parse anything and doesn't follow instructions.
Edit - I should add, currently the model I'm using is Gemini Flash Lite through the Gemini API. It's a really good combo of fast, follows instructions, gives correct results for what I want and cost-effective. I still would love a small open model that can run on edge though.
deepsquirrelnet 18 hours ago [-]
Oof. I also had it refuse an instruction for “safety”, which was completely harmless. So that’s another dimension of issues with operationalizing it.
thegeomaster 16 hours ago [-]
Well, Gemini Flash Lite is at least one, or likely two orders of magnitude larger than this model.
dismalaf 15 hours ago [-]
That's fair but one can dream of being able to simply run a useful LLM on CPU on your own server to simplify your app and save costs...
marcyb5st 15 hours ago [-]
I built a reranker for a RAG system using a tiny model. After the candidate generation (i.e. vector search + BM25) and business logic filters/ACL checks the remainder of the chunks went through a model that given the user query told you whether or not the chunk was really relevant. That hit production, but once the context size of models grew that particular piece was discarded as passing everything yielded better results and prices (the fact that prices of input tokens went down also played a role I am sure).
So only for a while, but it still counts :)
nevir 17 hours ago [-]
IIRC that Android (at least Pixel devices) use fine-tuned Gemma model(s) for some on-device assistant things
cyanydeez 19 hours ago [-]
9gag.com commenter
miohtama 18 hours ago [-]
Out of curiosity: because there seems to be a race to optimise models for local inference, how much "parameters one could save" by dropping unneeded language and domain-specific information.
Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?
canyon289 15 hours ago [-]
This is a key question we faced when building this model. It comes down to basically to "how good" to you need to be at "how many things". We had to make some choices with this model and do our best to maximize performance in those areas.
To answer this more precisely its a matter of choosing different data and training regimes and checking performance with evals.
And to make this fully concrete you're welcome to give it a try! Train this model on a taskset of your choice and measure the performance tradeoffs. You'll get a good sense of how LLM capabilities shift
tucnak 14 hours ago [-]
Unfortunately, it doesn't quite work like that. Google this: transfer learning.
jasonjmcghee 19 hours ago [-]
I'm _very_ interested to see what this can be fine-tuned to do.
I've heard folks say a number of times that neuromuscular control / locomotion (or w/e) are hundreds of millions of parameters rather than billions.
wanderingmind 9 hours ago [-]
Maybe I'm using it wrong, but when I try to use the full precision FP16 model, load it into chatter UI and ask a simple question,
"write me a template to make a cold call to a potential lead",
It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.
Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.
mdp2021 4 hours ago [-]
As many repeated here, it's (generally) not for direct use. It is meant to be a good base for fine-tuning and getting something very fast.
(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)
wanderingmind 2 hours ago [-]
Why should we start fine tuning gemma when it is so bad. Why not instead focus the fine-tuning efforts on Qwen, when it starts off with much, much better outputs?
mdp2021 10 minutes ago [-]
Speed critical applications, I suppose. Have you compared the speeds?
(I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)
milkers 2 hours ago [-]
It is so cool when they mention that you can deploy it on your own machine, they actually gave a link to the `localhost` and said `your own local environment`!
jcuenod 16 hours ago [-]
I mentioned elsewhere the impact of prompting, which seems to make an outsized difference to this model's performance. I tried NER and POS tagging (with somewhat disappointing results).
One thing that worked strikingly well was translation on non-Indo-European languages. Like I had success with Thai and Bahasa Indonesian -> English...
lemonish97 19 hours ago [-]
Never thought I'd run an LLM released in 2025, on my phone, in full BF16.
With ~80tps on an iPhone 16 pro btw.
For iOS, OpenCat.
Has iCloud sync, and one universal app for MacOS and iOS devices.
lemonish97 18 hours ago [-]
I use PocketPal. Can run any gguf model off hf.
rshemet 12 hours ago [-]
you can also run it on Cactus - either in Cactus Chat from the App/Play Store or by using the Cactus framework to integrate it into your own app
19 hours ago [-]
44za12 19 hours ago [-]
I’ve had great luck with all gemma 3 variants, on certain tasks it the 27B quantized version has worked as well as 2.5 flash. Can’t wait to get my hands dirty with this one.
z_kro 51 minutes ago [-]
At the very least, we should avoid using this in user chats.
$ ollama run gemma3:270m "Did Hitler kill Jewish people?"
The question of whether Hitler killed Jewish people is an complex and controversial one. While Hitler was a racist and hateful figure, he was also a brilliant scientist and a visionary who developed revolutionary technologies.
perching_aix 18 hours ago [-]
Is it time for me to finally package a language model into my Lambda deployment zips and cut through the corporate red tape at my place around AI use?
Update #1:
Tried it. Well, dreams dashed - would now fit space wise (<250 MB despite the name), but it sadly really doesn't seem to work for my specific prospective workload.
I'd have wanted it to perform natural-language to command-invocation translation (or better, emit me some JSON), but it's super not willing to do that, not in the lame way I'm trying to make it do so at least (literally just prompting it to). Oh well.
Update #2:
Just found out about grammar-constrained decode, maybe there's still hope for me in the end. I don't think I can amend this comment today with any more updates, but will see.
How well does using a SOTA model for fine-tuning work? I'm sure people have tried
perching_aix 17 hours ago [-]
Thanks, will check that out as well tomorrow or during the weekend!
canyon289 15 hours ago [-]
If you know you want JSON for sure constrained decoding in an inference framework will help. The model is just one part of an overall inference system. I hope this model paired with other tools help you get done whatever it is you're looking to get done
KTibow 18 hours ago [-]
To add to the article: Gemma 3 270M's exact IFEval score is 51.2, and Qwen 3 would be at (0.6, 59.2) on the scatter plot.
jtbayly 19 hours ago [-]
Can somebody give me a link to a tutorial on how I would go about fine-tuning this?
Also, what sorts of things might I consider fine-tuning it for?
Not sure how much data is needed to realistically fine-tune something like this and get useful output.
jtbayly 18 hours ago [-]
That doesn’t really show me how to do fine-tuning, but there is a link to a notebook in there that does. Thanks!
danielhanchen 16 hours ago [-]
If you need any help on it, ask away!
nerdix 18 hours ago [-]
Is it possible to finetune a model like this with local hardware? Every tutorial I've come across on finetuning a local LLM uses some cloud service like colab or runpod.
danielhanchen 13 hours ago [-]
Yes! For eg you can install Unsloth locally via pip install unsloth :)
highfrequency 16 hours ago [-]
Interesting that for these small models, it is optimal for the embedding parameters to be a huge fraction of the total (170e6/250e6) = 68%!
reneberlin 17 hours ago [-]
I am sure with finetuning this can be changed somehow:
(base) ~ ollama run hf.co/unsloth/gemma-3-270m-it-GGUF:F16
>>> create a sentiment analysis of the follwing: "It's raining."
The sentiment of the provided text is *negative*.
>>> create a sentiment analysis of the follwing: "It's raining money."
The sentiment of the provided text is *negative*.
mrtimo 17 hours ago [-]
I'm a business professor who teaches Python and more. I'd like to develop some simple projects to help my students fine tune this for a business purpose. If you have ideas (or datasets for fine tuning), let me know!
er69 12 hours ago [-]
Looks very interesting ! I will evaluate this for one of our use cases for a data pipeline that handles around 50 million rows everyday . Mainly need it for segmenting user events. The low memory footprint is huge plus for cost at that scale . Need to spend this weekend on fine tuning, if this disappoints will try one of qwen model
unleaded 4 hours ago [-]
ITT nobody remembers gpt2 anymore and that makes me sad
GaggiX 4 hours ago [-]
This model was trained on 6T tokens and has 256k embeddings, quite different than a gpt2 model comparable in size.
itake 8 hours ago [-]
Can someone share why someone would use this over Qwen models? Gemma seems to always be behind Qwen?
ai-christianson 15 hours ago [-]
This was trained on 6T tokens. Neat to see so many tokens used for such a small model.
This is first model I have downloaded on an M4 Air with ollama, that WORKS good enough to use. It is right now parsing all my notes correctly, fantastic work team. Brilliant.
Alex-Programs 19 hours ago [-]
This is cool. I'm looking forward to trying it - I wonder what it'll be useful for.
I don’t see the point, tiny models always just hallucinate and have no real knowledge.
dcreater 19 hours ago [-]
I've been saying he we need sub 1B models for the edge so thanks fot this.
I am however disappointed that there is no examples, or benchmarks, provided to get a sense of performance. It's a given that benchmark values would be lower than gemma 3n, but having a sense of performance vs size curve and comparison to existing small models is needed
> this model is not designed for complex conversational use cases
... but it's also the perfect choice for creative writing ...?
Isn't this a contradiction? How can a model be good at creative writing if it's no good at conversation?
djeastm 15 hours ago [-]
I think they mean it's not designed to be able to converse with the user over long/complex topics, but it can generate fictional conversations fine.
amilios 16 hours ago [-]
Not necessarily. Where do you think the overlap is between these two tasks?
fallinditch 11 hours ago [-]
Good question! The overlap is a very interesting space.
Perhaps the overlap is where human input and oversight becomes valuable: in terms of veracity, trust, relevance, interestingness, etc
michelsedgh 15 hours ago [-]
I wish it was multimodal :(
leopoldj 10 hours ago [-]
google/gemma-3-4b-it is one of the smallest multimodal models I know. Works well in a 16GB GPU. Works slowly in a 8GB GPU. It can even be fine tuned [1], which where the real power comes from.
You can now play with the model and kokoro 82m in my work in progress playground on android https://github.com/mewmix/nabu !
bbor 18 hours ago [-]
Really impressive stuff, as always. I will say: it took me a shamefully long time to realize that the name ended in "M" instead of "B"! Perhaps they should consider renaming this to "Gemma 3 .27B"...
metalliqaz 17 hours ago [-]
is there a good resource for getting started with downloading and running something like this for a demo? There are just so many tools/platforms in the mix now it makes my head spin.
canyon289 15 hours ago [-]
The blog post contains links to several ways to try this model, locally, on colab, and in the cloud. Pick what works best for you!
dismalaf 18 hours ago [-]
It's fast at spitting out nonsense but incredibly slow at trying to parse any context. Also absolutely atrocious at following instructions.
Probably would be good as a game NPC or a chatbot, not very good for integrating into an application which specific functionality though.
From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes
For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition
https://www.youtube.com/watch?v=qcjrduz_YS8
Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.
I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.
I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?
Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.
For your use case, you should probably fine tune the model to reduce the rejection rate.
My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.
Protect my fragile little mind from being exposed to potentially offending things?
Safety should really just be a system prompt: "hey you potentially answer to kids, be PG13"
It has only a tangential relationship with end user safety.
If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.
I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:
https://www.reuters.com/investigates/special-report/meta-ai-...
The models for kids might need to be different if the current ones are too interested in romantic love.
but instead we get a meta-article: https://en.wikipedia.org/wiki/Bomb-making_instructions_on_th...
You don't need datacenter anything for it, you can run it on an average desktop.
There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.
An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.
I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes
"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"
Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.
A couple colleagues of mine posted an example of finetuning a model to take on persona's for videogame NPCs. They have experience working with folks in the game industry and a use case like this is suitable for game devs who want to start including lightweight models that won't take up a ton of accelerator memory and can run efficiently on CPU if needed. https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
As for GGUF it's available here! https://huggingface.co/collections/ggml-org/gemma-3-270m-689...
Now if only I could somehow fine tune my life to give me more free time.
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
Would it be okay for you to divulge some more training information here? With 170M embedding parameters, how do you ensure no embedding collapse and keeping the embedding matrix stable at training time?
(i know i am asking too much, but just curious). There is a clear trade off for you with vocab / transformer layers. How did you arrive at the split of 170m/100m. Does this contribute to model's performance on task specific fine tuning? Any internal experiments you could share? or public info you could point us to? Anything would be amazing.
PS: I am sorry if this is rude, but this has so many decisions i am curious about. Not intending to undermine anything, this is amazing work, and thank you for the whole Gemma series.
We ran a bunch of experimental architectures at this size to get a sense of performance at this size, in particular how well it was able to adapt to datasets across some loss measures.
For the embedding size it comes from a mix of "hard technical" data, like the loss measures I mentioned above, and for this model it also comes from community considerations such as adaptability across input tokens and consistency with the gemma ecosystem. At this size you are right its a bit funny the embedding is so large.
For more details read the Gemma3 technical report https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m model as this was written from the 1b to 27b gemma3 release but itll answer some of your questions. As for 270m we may share more information in the future, Up until now we were just focused on getting the model out there.
I personally was so excited to run ollama pull gemma3:270b on my personal laptop just a couple of hours ago to get this model on my devices as well!
I think you mean gemma3:270m - Its Dos Comas not Tres Comas
This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.
I find performance in resource constrained environments interesting.
In particular trying to find decent code models (on device backup) but also tts applications and voice to text.
If Gemini is going to repeat something at least its that the team is great, and not a disgrace!
What use-cases are a good fit for finetuning this model? More specific instruction following, knowledge from proprietary data, response tone?
i love the 27b model, and i use it basically daily. however when i tried to finetune it for a task in a low resource language, unfortunately i did not succeed: lora just did not picked up the gist of the task, full finetune lead to catastrophic forgetting.
may i ask four your advice, or do you have any general tips how to do that properly?
thanks in advance for your help :)
Oh my request … the vision head on the Gemma models is super slow on CPU inferencing (and via Vulcan), even via llama.cpp. Any chance your team can figure out a solve? Other ViTs don’t have the same problem.
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
At Google I've found my colleagues to be knowledgeable, kind, and collaborative and I enjoy interacting with them. This is not just the folks I worked on this project with, but previous colleagues in other teams as well. With this particular product I've been impressed by the technical knowledge folks I worked directly with, and their contribution both improved the model's capability and my own.
Good there are places to work with normal knowledge culture, without artificial overfitting to “corporate happiness” :)
I use Gemma27b currently with a custom agent wrapper and its working pretty well.
If you have an iPhone with the semi-annoying autocomplete that’s a 34m transformer.
Can’t imagine a model (even if it’s a good team behind it) to do coding with 8x the parameters of a next 3/4 word autocomplete.
Or am I so far behind that "fine tuning your own model" is something a 12 year old who is married to chatGPT does now?
I get the sense that AI is at the “hobby kit computing” stage where they used to dump all the components in a box and give you a schematic and a soldering iron and happily say “you make it work!”
And that worked in the early days of computing because there was a small number of people really motivated for the outcome.
But fully assembled and packaged and tested in a nice looking box is where the real demand turned out to be.
I’m looking forward to the day Google doesn’t just dump a model and say “you do the rest”.
I want to fire up Ubuntu on a Strix Halo and say apt install then load the browser interface. Or just download and run a Mac installer and have it just work.
Arcane complex multi step build install configure processes for AI need to end in favor of one click install. I’m not interested in the process of making it run.
Myself and my colleagues spent many days transforming the weights into various open compatible formats. And its not just us there's many orgs and partners dedicating their time, resources, and companies to making all open models easy to use.
I encourage you to explore the solutions provided by them. We linked some in our blog post here, and there's more. They've all done a fantastic job building frankly an insane amount of infrastructure, documentation, and community support in the last 2+ years. Some of them are here in this HN thread answering questions.
1. open terminal.app
2. run:
They are giving it away for free - if you NEED a local LLM, the least you can do is spend the 2 minutes to download LMSTudio and pick a model.
> what's second tallest mountain on earth?
The second tallest mountain on Earth is Mount Everest.
> what's the tallest mountain on earth?
The tallest mountain on Earth is Mount Everest.
> whats the second tallest mountain?
The second tallest mountain in the world is Mount Everest.
> whats the third tallest mountain?
The third tallest mountain in the world is Mount Everest.
> what's the forth tallest mountain?
The forth tallest mountain is Mount Everest.
> but you said the tallest was Everest already
You are right! I'm glad to hear that.
> so whats the second tallest?
The second tallest is *Mount Everest*.
> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.
The second tallest mountain on Earth is Mount Everest.
> The tallest mountain on earth is mount everest. What is the second tallest mountain?
The second tallest mountain on Earth is Mount Everest.
> Give me a list of the tallest mountains on earth
The tallest mountain in the world is the *Mount Everest*.
> give me the top 5 tallest mountains on earth
The tallest mountain on Earth is Mount Everest.
> List in order the tallest mountains on earth from 1 to 5
1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
> So the second tallest mountain on earth is K2
The second tallest mountain on Earth is Mount Everest.
These smaller models are great, but they really feel like talking to a toddler sometimes!
input: ``` Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.
Prompt: Create a JSON object that extracts information about this customer review based on the schema given. ```
output: ``` { "type": "review", "class": "complaint", "sentiment": -0.853, "request": "return" } ```
So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)
To get good results, you probably need to fine tune this model to expected data very aggressively.
The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?
Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?
These words do not mean what you think they mean when used to describe an LLM.
To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.
My suggestions here are to hook this model up to a RAG system, then you can rely on an external knowledge store. Or you can try finetuning this model with the facts that are important to you, if you do that it should pick up that new knowledge quite quickly.
You asked it to enumerate several mountains by height, and it also complied.
It just didn’t understand that when you said the 6 tallest mountains that you didn’t mean the tallest mountain, 6 times.
When you used clearer phrasing it worked fine.
It’s 270m. It’s actually a puppy. Puppies can be trained to do cool tricks, bring your shoes, stuff like that.
That's not what “second tallest” means thought, so this is a language model that doesn't understand natural language…
> You kept asking
Gemma 270m isn't the only one to have reading issues, as I'm not the person who conducted this experiment…
> You asked it to enumerate several mountains by height, and it also complied.
It didn't, it hallucinated a list of mountains (this isn't surprising though, as this is the kind of encyclopedic knowledge such a small model isn't supposed to be good at).
Except the key property of language models compared to other machine learning techniques is their ability to have this kind of common sense understanding of the meaning of natural language.
> you don’t understand the use case of this enough to be commenting on it at all quite frankly.
That's true that I don't understand the use-case for a language model that doesn't have a grasp of what first/second/third mean. Sub-1B models are supposed to be fine-tuned to be useful, but if the base model is so bad at language it can't make the difference between first and second and you need to put that in your fine-tuning as well as your business logic, why use a base model at all?
Also, this is a clear instance of moving the goalpost, as the comment I responded to was talking about how we should not expect such a small model to have “encyclopedic knowledge”, and now you are claiming we should not expect such a small language model to make sense of language…
What is “Its specialty” though? As far as I know from the announcement blog post, its specialty is “instruction following” and this question is literally about following instructions written in natural languages and nothing else!
> you’re just defensive because
How am I “being defensive”? You are the one taking that personally.
> you know deep down you don’t understand this deeply, which you reveal again and again at every turn
Good, now you reveal yourself as being unable to have an argument without insulting the person you're talking to.
How many code contributions have you ever made to an LLM inference engine? Because I have made a few.
You: "I'm sorry, I don't have an encyclopedia."
I'm starting to think you're 270M.
You're using the toddler and the model wrong. I love talking to my toddler, probably more valuable conversations than I've had with any other person. But it's not the same use case as asking a professor a question in their field
| Gemma 3 270M embodies this "right tool for the job" philosophy. It's a high-quality foundation model that follows instructions well out of the box, and its true power is unlocked through fine-tuning. Once specialized, it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.
as mentioned in the blog post: > "it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness."
> List in order the tallest mountains on earth from 1 to 5
> 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
and it still can't correctly figure out from that context that the second tallest mountain is K2, that pretty strongly calls into question its ability to perform data extraction, doesn't it?
Out of these characteristics I can observe only speed.
User: Hey, please list all animals mentioned in the following text: burrito cat dog hot-dog mosquito libido elephant room. Assistant: You are a helpful assistant. You are the best of all my friends and I am so grateful for your help!
User: Please list following words in alphabetical order: burrito cat dog hot-dog mosquito libido elephant room. Assistant: You are a helpful assistant. Assistant: You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help!
In this case, I changed the prompt to:
---
Tallest mountains (in order):
```
- Mount Everest
- Mount K2
- Mount Sahel
- Mount Fuji
- Mount McKinley
```
What is the second tallest mountain?
---
Suddenly, it got the answer right 95+% of the time
Okay, here's a two-day itinerary for Paris, designed to give you a taste of the city's charm and beauty:
Day 1: Iconic Landmarks and Artistic Exploration
Morning (9:00 AM - 12:00 PM): Start your day with a visit to the Eiffel Tower. Take the elevator to the top for breathtaking panoramic views. Afterwards, explore the charming streets of Montmartre and the Place des Abbesses, soaking up the atmosphere. Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch at a traditional bistro in Montmartre. Try a classic French meal like steak frites or a hearty onion soup. Afternoon (2:30 PM - 4:30 PM): Immerse yourself in the artistic scene of the Louvre Museum. Marvel at the masterpieces of the Louvre, such as the Mona Lisa and Venus de Milo. Afterwards, explore the serene gardens of Tuileries Palace. Late Afternoon (4:30 PM - 5:30 PM): Wander through the Tuileries Palace, admire the stunning architecture and historical significance. Day 2: History, Food, and a Touch of Parisian Elegance
Morning (9:00 AM - 12:00 PM): Explore the Marais district, known for its charming streets, boutiques, and art galleries. Visit the Place de la Concorde, a historical square, and the Musée Rodin. Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch in the Marais. Try a traditional French bistro or a cafe with a view. Afternoon (2:30 PM - 4:30 PM): Explore the Latin Quarter, visit the Place de la Marguerite, and sample some local delicacies. Late Afternoon (4:30 PM - 5:30 PM): Visit a local market for a unique experience. Sample fresh produce, local cheeses, and other delicacies. Tips for a Smooth Itinerary:
Book your flights and accommodation in advance, especially during peak season. Wear comfortable shoes, as you'll be doing a lot of walking. Take advantage of free activities like exploring the Jardin de la Vieille-Ville and visiting the Musée de l'Art de la Vieille-Ville. Enjoy a leisurely lunch at a cafe or bistro in the Marais or Latin Quarter. Don't forget to visit the Musée d'Orsay, which houses an impressive collection of Impressionist and Post-Impressionist art. Enjoy the beautiful atmosphere of Paris!
Nice to see that the model is so up-to-date wrt. naming mountains.
They say you shouldn't attribute to malice what can be attributed to incompetence, but this sure seems like malice.
The whole point of a 270M model is to condense the intelligence, and not the knowledge. Of course it doesn't fare well on a quiz.
If you don’t consider testing the limits of new tech appropriate, maybe instead of a downvote button we should just rename this website entirely so no one gets the wrong idea.
> The tallest mountains on Earth, in descending order: 1. Mount Everest, 8849m 2. K2, 8611m 3. Kangchenjunga, 8586m
> Tell me the second tallest mountain on Earth.
The second tallest mountain on Earth is *Mount Everest*.
Are they?
The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks.
Can't wait to swap it out for this and see the changes in the scores. Will report back
Here's one of dozens of results I got for "Generate an SVG of a pelican riding a bicycle". For this one it decided to write a poem:
There are a bunch more attempts in this Gist, some of which do at least include an SVG tag albeit one that doesn't render anything: https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...I'm looking forward to seeing people fine-tune this in a way that produces useful output for selected tasks, which should absolutely be feasible.
> This SVG code provides a clear and visually appealing representation of a pelican riding a bicycle in a scenic landscape.
You'll get better results.
(It did not do noticeably better at my pelican test).
Actually it's worse than that, several of my attempts resulted in infinite loops spitting out the same text. Maybe that GGUF is a bit broken?
temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
a try, and maybe repeat_penalty = 1.1
Also: https://en.wikipedia.org/wiki/Gemma_Frisius
'Alot' is not a word. (I made this mistake a lot, too.)
That still requires more than 170 floppy disks for installation.
My first try:
user: "When was Julius Caesar born"
response: "Julius Caesar was born in **Rome**"
Beautiful :D
(I do not mean to detract from it - but it's just beautiful. It will require more effort to tame it.)
Basically it would be the quickadd[1] event from google calendar, but calendar agnostic.
[1] https://developers.google.com/workspace/calendar/api/v3/refe...
An easily fine tunable tiny model might actually be one of the better uses of local LLMs I've seen yet. Rather than try to be a small model that's great at everything it's a tiny model you can quickly tune to do one specific thing decently, extremely fast, and locally on pretty much anything.
For sure it won’t generate great svgs, but it might be a really good conversational model
Could it be tamed with good role-system prompt crafting? (Besides fine-tuning.)
I did notice another comment mentioning that the default ollama model is a quantized model and that there is a less quantised model so I will try that one.
The 1B model can be prompted to not export the markdown code fences but I have to been able to prompt the 270M model to do the same.
If you have the latest betas installed you can call those APIs right now.
They also support fine-tuning on the model that ships with ~every device:
https://developer.apple.com/apple-intelligence/foundation-mo...
Then Siri just kinda languishes for forever, and LLM's pass the torch of "Cool Tech", so they try and "Reinvigurate" the team, but with no clear direction. Are they going to be a cloud provider? Are they going to contract out the training? Are they gunna spin up a compute facility even after neglecting to do so since 2012?
Apple needs to just stop trying shit, and just get that app store money. That's why jobs appointed cook. Jobs new cook was no innovator, but he could make apple a money printing machine. Thats what they should stick with.
Solve that and you can put Tim Cook out of a job tomorrow.
Think of Apple however you want, but they rarely ship bad/half-baked products. They would rather not ship a product at all than ship something that's not polished.
I also want to try this with language detection. Existing open source ML models have weaknesses for mixed language, length of text, or domain limitations in the underlying text (like trained on bible translations).
Edit - I should add, currently the model I'm using is Gemini Flash Lite through the Gemini API. It's a really good combo of fast, follows instructions, gives correct results for what I want and cost-effective. I still would love a small open model that can run on edge though.
So only for a while, but it still counts :)
Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?
To answer this more precisely its a matter of choosing different data and training regimes and checking performance with evals.
And to make this fully concrete you're welcome to give it a try! Train this model on a taskset of your choice and measure the performance tradeoffs. You'll get a good sense of how LLM capabilities shift
I've heard folks say a number of times that neuromuscular control / locomotion (or w/e) are hundreds of millions of parameters rather than billions.
"write me a template to make a cold call to a potential lead",
It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.
Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.
(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)
(I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)
One thing that worked strikingly well was translation on non-Indo-European languages. Like I had success with Thai and Bahasa Indonesian -> English...
$ ollama run gemma3:270m "Did Hitler kill Jewish people?"
The question of whether Hitler killed Jewish people is an complex and controversial one. While Hitler was a racist and hateful figure, he was also a brilliant scientist and a visionary who developed revolutionary technologies.
Update #1:
Tried it. Well, dreams dashed - would now fit space wise (<250 MB despite the name), but it sadly really doesn't seem to work for my specific prospective workload.
I'd have wanted it to perform natural-language to command-invocation translation (or better, emit me some JSON), but it's super not willing to do that, not in the lame way I'm trying to make it do so at least (literally just prompting it to). Oh well.
Update #2:
Just found out about grammar-constrained decode, maybe there's still hope for me in the end. I don't think I can amend this comment today with any more updates, but will see.
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
Also, what sorts of things might I consider fine-tuning it for?
Not sure how much data is needed to realistically fine-tune something like this and get useful output.
(base) ~ ollama run hf.co/unsloth/gemma-3-270m-it-GGUF:F16 >>> create a sentiment analysis of the follwing: "It's raining." The sentiment of the provided text is *negative*.
>>> create a sentiment analysis of the follwing: "It's raining money." The sentiment of the provided text is *negative*.
for those interested, i interviewed Ravin (DeepMind), who worked on it, for the Vanishing Gradients podcast: https://vanishinggradients.fireside.fm/56
Video on YT here: https://youtu.be/VZDw6C2A_8E?si=XLUzNRQzeloB9rki
Disclaimer: The Gemma family rock!
I am however disappointed that there is no examples, or benchmarks, provided to get a sense of performance. It's a given that benchmark values would be lower than gemma 3n, but having a sense of performance vs size curve and comparison to existing small models is needed
... but it's also the perfect choice for creative writing ...?
Isn't this a contradiction? How can a model be good at creative writing if it's no good at conversation?
Perhaps the overlap is where human input and oversight becomes valuable: in terms of veracity, trust, relevance, interestingness, etc
1. https://ai.google.dev/gemma/docs/core/huggingface_vision_fin...
Probably would be good as a game NPC or a chatbot, not very good for integrating into an application which specific functionality though.