After finally getting my hands on a local large model, I was thrilled to discover that Apple's M2 Mac Mini is a great option for running large language models. I previously wanted to try this but was prevented by the limited 4G graphics card of my laptop's RTX 3050. I was pleasantly surprised to find that Apple's devices can also support large language models, and I had the M2 Mac Mini available, which was a dream come true.

The user-friendliness of LM Studio is definitely one of its strong points. You can easily download and install models from the Hugging Face library, and the interface is modern and easy to use. The setting menu allows you to switch languages, although the translation is not complete but is better than nothing.

After setting up LM Studio, I was able to download and load large models. The model selection interface allows you to search for models based on their size, and I recommend choosing models in the MLX format, which has smaller data transfer overhead and is better suited for M-series chips.

The Qwen2-7B-Instruct-4bit model ran smoothly on my 8G Ram Mac Mini, with a context length of up to 32k and a reasonable speed of around 19.9 tokens/s. Compared to other models like Phi 3, Gemma 2, Deepseek, Mistral, and RAG, Qwen2 stood out for its coherence and understanding of Chinese. The 4bit quantization might still be a bit slow, but it was a pleasant surprise to see Qwen2 perform so well.

For now, I will stick with Qwen2, but perhaps I will try Qwen2.5 in the future to see if it can outperform its predecessor. I am looking forward to exploring more features of LM Studio and fine-tuning my models for better performance.