Generative AI and the Digital Commons: Difference between revisions

From P2P Foundation
Jump to navigation Jump to search
unknown (talk)
(Created page with " '''* Article / Working Paper: Generative AI and the Digital Commons. Saffron Huang and Divya Siddarth. AI Commons working group,''' URL = https://cip.org/research/generative...")
(No difference)

Revision as of 01:20, 23 March 2023

* Article / Working Paper: Generative AI and the Digital Commons. Saffron Huang and Divya Siddarth. AI Commons working group,

URL = https://cip.org/research/generative-ai-digital-commons

Description

"The goal of this work is to build models of governance for generative foundation models, such as GPT-3 and Stable Diffusion, that enable broadly shared benefit. Our initial hypothesis is that data is a high-value lever of governance. Many of these models are trained on publicly available data and use public infrastructure, but 1) may degrade the “digital commons” that they depend on, and 2) do not have processes in place to return value captured to data producers and stakeholders. Separately, we see a need to collect high quality information to understand e.g. the economic impacts of these models. This makes such data potentially one of the most important bottlenecks and impacted factors in this problem space.

Existing conceptions of data rights and protection (focusing largely on individually-owned data and associated privacy concerns) and copyright or licensing-based models offer some instructive priors, but are ill-suited for the issues that may arise from models trained on commons-based data. Forward-looking proposals include investments in standardized dataset/model disclosure and other kinds of transparency when it comes to generative models’ training and capabilities, consortia-based funding for monitoring/standards/auditing organizations, requirements or norms for GFM companies to contribute high quality data to the commons, and structures for shared ownership based on individual or community provision of fine-tuning data."

(https://cip.org/research/generative-ai-digital-commons)


Excerpts

Generative Foundation Models

Saffron Huang and Divya Siddarth:

"We will use the phrase “generative foundation models” (GFMs) to refer to machine learning systems that are: 1) “generative” — they generate text, images, or other sequences of information based on some input prompt, and 2) “foundation models” — neural network models trained on a large dataset comprising diverse origins and content, and can be adapted to a wide range of tasks. (Machine learning, or ML, is sometimes also referred to as artificial intelligence, or AI).

Examples of well-known GFMs are: OpenAI’s GPT family of language models (including ChatGPT) that take in text and generate text; DALLE-2, which takes in text/images and generates images; BERT, which takes in text and generates text, Stable Diffusion, which takes in text and generates images; Codex, which takes in code (a specific kind of text) and generates code.

We speak of “generative” foundation models, rather than foundation models at large per Bommasani et al, because we are concerned primarily with the applicability of these models for generating content, such as generating text, code or images. This may include tasks such as summarization (generating a summary of a text) or text continuation (continuing the text by iteratively predicting the next word) or creating images and videos.

Generative foundation models are a general technology, although they benefit from adaptation to the “downstream” tasks they are used for, e.g. by “fine-tuning” them by training on more specific datasets such that they can generate the appropriate material for the use context."

(https://cip.org/research/generative-ai-digital-commons)


Directory

Via [1]:

"The structure of the nascent industry is likely to greatly change, but at the moment there are a few key actors creating more general-purpose GFMs, such as OpenAI, Midjourney, EleutherAI, BigScience, and Stability. More-specialized companies often build off the technology released by the actors above, applying the technology for specific tasks

The structure of the nascent industry is likely to greatly change, but at the moment there are a few key actors creating more general-purpose GFMs, such as

  1. OpenAI,
  2. Midjourney,
  3. EleutherAI,

$BigScience, and

  1. Stability.

More-specialized companies often build off the technology released by the actors above, applying the technology for specific tasks:

Copywriting (Copy.ai, Jasper, NeuralText, Nichesss)

Website generation (The.com, Debuild)

Marketing and stock image generation (Shutterstock, Picsart, Jasper Art)

General image editing and design (Photoshop plugin, Microsoft’s Designer)

General writing, editing, and content management (Microsoft Teams Premium, NotionAI, Coda + OpenAI, Chibi AI)

Video editing (Runway, Kino)

Code generation (Github Copilot, Tabnine) "

Research assistants (Elicit)

Building floor plans for home renovations (Tailorbird)

Excel/data wrangling and cleaning (Charm)