Hermes Models


WeightWatcher Analysis of Hermes-3 Models
Introduction to Hermes
Hermes is a project by Nous Research, aimed at creating high-performance, fine-tuned language models optimized for efficiency and generalization. The Hermes-3 models leverage LLaMA-3 architecture for various natural language understanding tasks.
Models Analyzed:
1. Hermes-3-Llama-3.1-8B: An unmerged model offering strong generalization and task-specific capabilities.
2. Hermes-2-Theta-Llama-3-8B: A merged version, slightly less performant than the unmerged counterpart.
WeightWatcher Results:
- Alpha Distribution:
- Both models show most alpha values within the HTSR safe range (2-6), indicating well-conditioned layers.
- Hermes-2-Theta has a denser concentration of alphas in the range 3-4.5, suggesting better overall stability.
- Mean Alpha:
- Hermes-2-Theta shows a marginally higher average alpha, but the difference is minimal. Variability is consistent across both models.
- Overfitting Risk:
- Few layers in both models have alphas below 2, showing minimal overfitting.
Conclusion
While both models are well-conditioned, Hermes-3-Llama-3.1-8B benefits from being unmerged, providing slightly better performance. Hermes-2-Theta-Llama-3-8B, despite being merged, still demonstrates solid generalization and stability, making it a competitive alternative.


Hermes Models Included

Hermes Model Set Plots