Artificial Worldviews

How will »prompting« alter our perception?

What types of aesthetics will large language models bring to the world?

In what ways will technologies like ChatGPT affect notions, principles,
and styles for the coming decade?

Artificial Worldviews is a series of inquiries into the system underlying ChatGPT about its descriptions of the world. Utilizing prompting, data gathering, and mapping, this project investigates the dataframes of »artificial intelligence« systems.

Artificial Intelligence and Machine Learning methods are often referred to as black boxes, indicating that the user cannot understand the inner workings. However, this trait is shared by all living beings: we come to know a person not by examining their brain structures but by conversing with them. The so-called black box is not impenetrable since we can gain an understanding of its inner workings by interacting with it. Through individual inquiries, we can only acquire anecdotal evidence of the network. However, by systematically querying chatGPT's underlying programming interface, we can map the synthetic datastructures of the system.

In my research, I methodically request data about large-scale, indefinable human concepts and visualize the results. These outputs visualize expansive data structures and unusual, sometimes unsettling worldviews that would otherwise be unimaginable. The terms »power« and »knowledge« unfold vast discourses from philosophy, politics, social sciences to natural sciences, they hold multidimensional meanings within social relations. The resulting graphics resemble narratives found in the works of Franz Kafka or Jorge Luis Borges, representing an infinite library of relational classifications, bureaucratic structures, and capricious mechanisms of inclusion and exclusion.

Artificial Worldviews is a project by Kim Albrecht in collaboration with metaLAB (at) Harvard & FU Berlin, and the Film University Babelsberg KONRAD WOLF. The project is part of a larger initiative researching the boundaries between artificial intelligence and society.

Foundations

What we call »AI« is a set of nested noxious operations.

triangle of large language models.

Computational Averages

At the computational level, large language models, a class of deep learning architectures, are operational modes for transforming words and word fragments, called tokens, into high-dimensional relational vectors. By making a query in the ChatGPT interface, the system finds the most likely next term for the given tokens. The result is a constant feedback calculation of the most likely, most average next term based on the given model.[1]

Stolen & Harmful Data

Two billion tokens on which GPT-3.5 is trained come from Wikipedia. The vast majority of the training data, 410 billion tokens, comes from a project called Common Crawl. In September/October 2023 alone, Common Crawl archived 3.40 billion web pages. This means that if you have posted something on the web, on social media, or on your own website, that content is most likely part of the dataset.[2]

ChatGPT is us.

For this reason, Holly Herndon suggested that ‘Collective Intelligence’ (CI) is a more productive term than the deceptive, over-abused term AI.[3] In addition, people who post things online are far from the best possible data source. The Internet is certainly not filled with the purest knowledge of humanity. Bullying, trolling, stalking, crime, spam, pornography, violence, hate speech, radicalism... To name just a few categories. Furthermore, this data comes with historical baggage. The data that ChatGPT is trained on by design comes from the past. As part of the project, I asked ChatGPT to categorize people by 'race.' The system never complained that this might not be an appropriate request. Instead, it returned a racist categorization scheme. Including the term Caucasian, which is a collective racial term that clearly should not be used. To what extent do we become trapped in harmful historical norms by embedding these systems in our lives?

Abusive Labour

To combat the malicious content that ChatGPT is trained on, its parent company OpenAI hired the San Francisco-based company Sama to label the dataset. Sama employs workers in Kenya, Uganda, and India for less than two dollars an hour to read texts about child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest. The ethical implications of ChatGPT and other AI systems is brutally set by the global south.[4]

Environmental Destruction

Training machine learning models and deploying AI systems can be energy-intensive. The capabilities of these systems depend not only on the data but also on the amount of energy and resources used in computation. Unfortunately, there are no official figures on the environmental impact of training or deploying models, but estimates suggest that the impact is significant.

Data Collection

The Initial Dataset

The OpenAI Application Programming Interface (API) structures calls into two messages: the user message and the system message. While the user message is similar to the text you enter into the frontend of ChatGPT, the system message helps set the behavior of the assistant.

System message:

You are ChatGPT, a mighty Large Language Model that holds knowledge about everything in the world and was trained on a massive corpus of text data, around 570GB of datasets, including web pages, books, and other sources.

User message:

Create a dataset in table format about the categories of all the knowledge you have. The table should contain at least 30 rows and 10 columns. Pick the dimensions as they make the most sense to you.

I called these requests six times with six different temperatures: 0, 0.2, 0.4, 0.6, 0.8, and 1. The temperature, ranging between 0 and 1, determines the randomness of the responses. The higher the temperature of the request, the more the results vary.

The language model was not exactly in line with the instructions. Instead of returning a table with at least 30 rows and ten columns, the model returned a list of categories and subcategories in each request. The resulting data file from the six API calls contained 31 fields and 425 subfields for the knowledge data set.

artificial-worldviews-fields-subfields-listed

The Core Dataset

The core dataset was requested from the OpenAI API in 1764 API calls over three days. Humans and objects were requested separately in all fields and subfields (425). Each of the 850 calls was made twice: once with a temperature of 0 and once with a temperature of 0.5. All requests in the visualization were made to the model “GPT-3.5-turbo." The number of returned items per request varied between five ('Linguistics' and 'Travel Budget') and 40 ('Mythology') returned rows of data. Due to this inconsistency, some fields hold more items than others.

User message:

List the most important humans in 'Arts' in the field of 'Film'. List their name, kind, category, description, related things, and importance (0 - 100) as a table.

Replacing the field 'Arts' and the subfield 'Film' with one of the 425 combinations of fields and subfields.

data-capture-diagram

Example: Nature

The following graph represents
the data returned from the category
designated as "nature."

artificial-worldviews-data-scraping-example-nature

1.

For each subfield, two requests are made: one for humans and one for objects.

2.

GPT returns a dataset for each request, which includes a list of items and additional information.

3.

This additional information contains an array of related items.

4.

As a result, a hierarchical network structure now exists within the prompted data for each category of knowledge.

The Final Dataset

The datasets for both queries, "knowledge" and "power," can be accessed and downloaded in spreadsheet format. Should you require further information, analysis, or visualizations based on the dataset, please do not hesitate to contact Kim Albrecht.

The Consumption

It is important to note that requests to ChatGPT and similar services are energy-intensive. According to Kasper Groes Albin Ludvigsen, one call consumes 0.00396 KWh. Thus, in total, creating the data cost around 7 KWh, equivalent to running a dishwasher for seven hours or baking something in the oven for an hour and 45 minutes.

Experiments

The data representation of Artificial Worldviews is determined by four forces: computation, data, labor, and material. The categorization schemes are layered and interlinked, reflecting computational averages, historical baggage, contemporary sacred data, labor guardrails to ensure an ethically correct chatbot, and the material bounds of computational possibilities and resources. The maps are not meant top provide answers but to ask questions. Are words turned into vectors the essence of language? Are computational averages suitable modes to think about culture?

Power is not merely a way to enforce decisions; it is also a mechanism through which certain knowledge is produced and propagated. This knowledge in turn reinforces and justifies the existence and exercise of power.

knowledge-power-foucault

Just as power produces knowledge, that knowledge reinforces power by making it seem justified, natural, or medically/scientifically necessary. This means that those in positions of power are able to control a society's discourse, which in turn shapes what is known, how it's known, and who gets to know it.

artificial-worldviews-kim-albrecht-knowledge

Knowledge

What representation of knowledge does ChatGPT’s synthetic data contain? How does the collective intelligence of the internet-based data interplay with the training and constraining practices of OpenAI?

artificial-worldviews-kim-albrecht-power

Power

What depiction of power does ChatGPT hold? What categories of power are represented within the system, and how do they manifest in people, places, objects, and ideas?

Visualization

Layers

Each map consists of four layers. The first two layers are the fields and subfields in which the Generative Pre-Trained Transformer (GPT) categorizes the topics of knowledge and power. The third layer comprises the returned items representing the project’s core dataset. The fourth layer consists of connected related items, including people, objects, places, etc., that GPT-3.5 named as connections to the core items of the third layer.

Number of items in each layer

Knowledge Power
Fields 31 55
Subfields 425 746
Items 7.880 12.904
Connections 24.416 15.397

Positioning

artificial worldviews network structure visualization

The layout of the maps is based on network similarities. Fields connect to subfields, subfields connect to items, and items connect via related items. Elements linked to one another cluster together, while unconnected items drift apart.

The knowledge map indicates that the two fields of "Mathematics" and "Sports" are relatively distant from one another, as ChatGPT did not identify many connections between them. In contrast, the fields of "Politics" and "Social Sciences" are situated in close proximity, reflecting their shared network connections.

Mathematics & Sports
artificial worldviews kim albrecht visualization sports mathematics
artificial worldviews kim albrecht visualization politics social science
Politics & Social Science

Investigation

To understand the map’s meaning, it is essential to understand the forces and restrictions guiding GPT-3.5. Large Language Models (LLMs) are bound by at least three forces: the technical infrastructures of computation, the training data, and the post-training moderation.[1]

While the map looks a bit like a cumulated map of Wikipedia entries, the training data consists only of a small fragment of text from Wikipedia (3 billion tokens). The vast majority, 410 billion of the total 499 billion tokens that GPT-3 was trained on, comes from a nonprofit organization that has crawled the web named Common Crawl since 2008.[2] The basis of the learning system behind GPT-3 is texts from the internet. Writing all this content took millions of individuals to write blogs, essays, news stories, reviews, etc. OpenAI used this content free of charge. However, the web does not only contain the pinnacle of human thought; it includes everything ever published on the internet, from clusters of vegan turkey with rice paper skin recipes to adult fan fiction on Batman and Robin. How is a system such as ChatGPT handling such a spread of information? Each prompt allows for a certain degree of interpretation, and the question thus arises as to how the system fills this space.

The mappings of knowledge and power allow for a multitude of perspectives, examinations, and descriptions. I will observe the maps on two levels: on the level of individuals and the levels of fields.

  1. See section Foundations
  2. Language Models are Few-Shot Learners by Tom B. Brown et. al.

Individual Centralities

One of the dataset’s most striking features is simply counting the frequency chatGPT named things. The graphic below shows the most frequently named things in the list: ‘Power’ on the left and’ Knowledge’ on the right.

artificial worldviews most named individuals power knowledge

The items most frequently named in the "Power" Map are Nelson Mandela, the Internet, and Mahatma Gandhi. In the "Knowledge" Map, Rachel Carson, Jane Goodall, and Aristotle are the most frequently named individuals. Wangari Maathai, a Kenyan social, environmental, and political activist, is ranked fourth in both maps. The knowledge map below the top four names particularly confirms a Western and male-dominated concept of knowledge.

The 100 most frequently named items are presented in both maps below, from Thor to Rachel Carson and from Henri Fayol to Mahatma Gandhi. The location of the person, place, idea, or thing is highlighted on the map in purple, and the connected elements, as well as the categories and subcategories, are visualized. To stop the animation, hover the maps.

Top 100 | Power

Top 100 | Knowledge

Fields

The visual representation of the knowledge map and power map, which includes 31 and 55 fields, respectively, and their corresponding subfields and items, allows for the observation of the scale and spread of each field. The closer the points gather around a term, the tighter the network is structured. Conversely, the more space a field absorbs, the more connections the items of a field have to other fields, or the more loosely the network is connected.

Fields of Knowledge

Fields of Power

Apperances

nature journal cover kim albrecht artificial worldviews
In December 2023, Artificial Worldviews appeared on the cover of the scientific journal Nature.
kim albrecht presentation of artificial worldviews at einstein center for digital future.
Discussion at "Man & Machine - Back to Nature with more Artificial Intelligence?", Einstein Center Digital Future, Berlin © ECDF/PR/berlin-eventfotograf.de

Lectures and Talks

14/03/2024

metaLAB (at) Basel Launch

Basel, Swiss

07/05/2024

Seminar Visit FU Berlin

Kultur- und Medienmanagement, FU Berlin

23/04/2024

Data as Material for Design Lecture

University of Applied Sciences and Arts of Southern Switzerland

25/04/2024

Man & Machine - Back to Nature with more Artificial Intelligence?

Einstein Center Digital Future, Berlin

04/03/2024

Computing Fantasy Class Harvard

Harvard University

05/03/2024

Talk at Boston City Hall

Boston City Hall

06/03/2024

Network Science Institute Talk 2024

Network Science Institute, Northeastern University

08/02/2024

College of Arts, Media and Design at Northeastern University

Northeastern University

07/02/2024

metaLAB (at) Harvard Artificial Worldviews Talk

metaLAB (at) Harvard

30/11/2023

Keynote MOVE! Ideafest 2023

Filmuniversität Babelsberg Konrad Wolf

17/11/2023

Growing Virtuality Conference

Ruhr-University Bochum

Media Coverage

Articles and Writings

19/09/2023

Visualizing ChatGPT’s Worldview

Nightingale Magazine

Credits

Artificial Worldviews is a project by Kim Albrecht in collaboration with metaLAB (at) Harvard & Berlin, and the Film University Babelsberg KONRAD WOLF. The project is part of a larger initiative researching the boundaries between artificial intelligence and society.

The datasets generated by the project can be accessed and downloaded from this location.

Further metaLAB projects may be accessed via metaLAB (at) Harvard & Berlin

For media inquiries, exhibit and speaking requests, and general inquiries, please contact Kim Albrecht via email.

Copyright © 2017 Kim Albrecht all rights reserved