The Falcon LLM has been developed by the Technology Innovation Institute (TII)and has been open-sourced under the Apache 2 Lisence.
Falcon LLM is a foundational LLM with 4B parameters, and has beentrained on one trillion tokens.It has been trained on a very clean version of the CommonCrawl,along with data from carefullly curated sources.As stated in the Falcon paper
Challenging existing beliefs on data quality and LLMs, models trained on adequately filteredand deduplicated web data alone can match the performance of models trained on curated data.
And this seems to have a big effect on the weighwatcher metrics.
Looking at the weightwatcher alphas, curiously, it appears thatthe 7b model is slightly better trained than the 40b model.This may arise because the 40b model is larger, but thedata set it is trained on is not (or at least not largeenough to fully utilize all the layers in this model)Moreover, the medican scale is smaller for the 40b vs the 7b model,which, again, is a bit unusual.
On the other hand, the median Dks is smaller for the 40bmodel, indicating that the power law fits are better.Moreover, neither model has many untrained layers,and we can conclude that these most of the layers inboth the Falcon models are well trained.
The Falcon model is a great advance in the science of LLMs,and we hope weightwatcher can help users wanting tounderstand how to use this model to its full potential.