Open Source Large Language Models

From P2P Foundation
Jump to navigation Jump to search

= open-source LLMs


Arthur Spirling:

"Researchers need to collaborate to develop open-source LLMs that are transparent and not dependent on a corporation’s favours.

It’s true that proprietary models are convenient and can be used out of the box. But it is imperative to invest in open-source LLMs, both by helping to build them and by using them for research. I’m optimistic that they will be adopted widely, just as open-source statistical software has been. Proprietary statistical programs were popular initially, but now most of my methodology community uses open-source platforms such as R or Python.

One open-source LLM, BLOOM, was released last July. BLOOM was built by New York City-based AI company Hugging Face and more than 1,000 volunteer researchers, and partially funded by the French government. Other efforts to build open-source LLMs are under way. Such projects are great, but I think we need even more collaboration and pooling of international resources and expertise. Open-source LLMs are generally not as well funded as the big corporate efforts. Also, they need to run to stand still: this field is moving so fast that versions of LLMs are becoming obsolete within weeks or months. The more academics who join these efforts, the better.

Using open-source LLMs is essential for reproducibility. Proprietors of closed LLMs can alter their product or its training data — which can change its outputs — at any time."