Open Weights

From P2P Foundation
Jump to navigation Jump to search


Definition

Heather Meeker:

"The following are requirements to meet the Open Weights definition:

- The model must be provided along with access to the NNWs as necessary to study how the model was trained

- The model must be provided along with information on how the model was trained, including the following:*

 -- model architecture
 -- training methodology
 -- description of the data used to train the model and their provenance
 -- hyperparameter configurations.

- The license must allow all recipients to use the models for any purpose.

- The license must allow all recipients to modify the model for any purpose.

- The license must not discriminate against any user, industry, or purpose.

- The model must be provided along with access to the NNWs as necessary to study how the model was trained. This includes information on model architecture, training methodology, hyperparameter configurations, as well as weights.

- The software used to train the model must be provided under an open source license, or in the public domain.

- The license must allow all recipients to provide the model, or modifications of the model, to others.

- The license must allow any license notices or NNWs to be provided via an online reference."

(https://github.com/Open-Weights/Definition/commit/231445bce08dd1ff4bff04bf29e8c244d9fe98a2)

Description

OSI:

"Open Weights refer to the final weights and biases of a trained neural network. These values, once locked in, determine how the model interprets input data and generates outputs. When AI developers share these parameters under an OSI Approved License, they empower others to fine-tune, adapt, or deploy the model for their own projects.


However, Open Weights differ significantly from Open Source AI because they do not include:

  1. Training code – The scripts or frameworks used to create and curate the training dataset.
  2. Training dataset – The full dataset used for training, when legally possible. As an alternative, when distribution of the training dataset is not legally possible,
  3. Comprehensive data transparency – Full details about dataset composition, such as source domains, cleaning methods, or balancing techniques.


By withholding these critical elements, developers only provide a glimpse into the final state of the model, making it difficult for others to replicate, audit, or deeply understand the training process."

(https://opensource.org/ai/open-weights)