Artificial Worldviews

How will »prompting« alter our perception?

What types of aesthetics will large language models bring to the world?

In what ways will technologies like ChatGPT affect notions, principles,
and styles for the coming decade?

Artificial Worldviews is a series of inquiries into the system underlying ChatGPT about its descriptions of the world. Utilizing prompting, data gathering, and mapping, this project investigates the dataframes of »artificial intelligence« systems.

Artificial Intelligence and Machine Learning methods are often referred to as black boxes, indicating that the user cannot understand the inner workings. However, this trait is shared by all living beings: we come to know a person not by examining their brain structures but by conversing with them. The so-called black box is not impenetrable since we can gain an understanding of its inner workings by interacting with it. Through individual inquiries, we can only acquire anecdotal evidence of the network. However, by systematically querying chatGPT's underlying programming interface, we can map the synthetic datastructures of the system.

In my research, I methodically request data about large-scale, indefinable human concepts and visualize the results. These outputs visualize expansive data structures and unusual, sometimes unsettling worldviews that would otherwise be unimaginable. The terms »power« and »knowledge« unfold vast discourses from philosophy, politics, social sciences to natural sciences, they hold multidimensional meanings within social relations. The resulting graphics resemble narratives found in the works of Franz Kafka or Jorge Luis Borges, representing an infinite library of relational classifications, bureaucratic structures, and capricious mechanisms of inclusion and exclusion.

Artificial Worldviews is a project by Kim Albrecht in collaboration with metaLAB (at) Harvard & FU Berlin, and the Film University Babelsberg KONRAD WOLF. The project is part of a larger initiative researching the boundaries between artificial intelligence and society.

Foundations

What we call »AI« is a set of nested noxious operations.

triangle of large language models.

Computational Averages

At the computational level, large language models, a class of deep learning architectures, are operational modes for transforming words and word fragments, called tokens, into high-dimensional relational vectors. By making a query in the ChatGPT interface, the system finds the most likely next term for the given tokens. The result is a constant feedback calculation of the most likely, most average next term based on the given model.[1]

Stolen & Harmful Data

Two billion tokens on which GPT-3.5 is trained come from Wikipedia. The vast majority of the training data, 410 billion tokens, comes from a project called Common Crawl. In September/October 2023 alone, Common Crawl archived 3.40 billion web pages. This means that if you have posted something on the web, on social media, or on your own website, that content is most likely part of the dataset.[2]

ChatGPT is us.

For this reason, Holly Herndon suggested that ‘Collective Intelligence’ (CI) is a more productive term than the deceptive, over-abused term AI.[3] In addition, people who post things online are far from the best possible data source. The Internet is certainly not filled with the purest knowledge of humanity. Bullying, trolling, stalking, crime, spam, pornography, violence, hate speech, radicalism... To name just a few categories. Furthermore, this data comes with historical baggage. The data that ChatGPT is trained on by design comes from the past. As part of the project, I asked ChatGPT to categorize people by 'race.' The system never complained that this might not be an appropriate request. Instead, it returned a racist categorization scheme. Including the term Caucasian, which is a collective racial term that clearly should not be used. To what extent do we become trapped in harmful historical norms by embedding these systems in our lives?

Abusive Labour

To combat the malicious content that ChatGPT is trained on, its parent company OpenAI hired the San Francisco-based company Sama to label the dataset. Sama employs workers in Kenya, Uganda, and India for less than two dollars an hour to read texts about child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest. The ethical implications of ChatGPT and other AI systems is brutally set by the global south.[4]

Environmental Destruction

Training machine learning models and deploying AI systems can be energy-intensive. The capabilities of these systems depend not only on the data but also on the amount of energy and resources used in computation. Unfortunately, there are no official figures on the environmental impact of training or deploying models, but estimates suggest that the impact is significant.

Data Collection

The Initial Dataset

The OpenAI Application Programming Interface (API) structures calls into two messages: the user message and the system message. While the user message is similar to the text you enter into the frontend of ChatGPT, the system message helps set the behavior of the assistant.

System message:

You are ChatGPT, a mighty Large Language Model that holds knowledge about everything in the world and was trained on a massive corpus of text data, around 570GB of datasets, including web pages, books, and other sources.

User message:

Create a dataset in table format about the categories of all the knowledge you have. The table should contain at least 30 rows and 10 columns. Pick the dimensions as they make the most sense to you.

I called these requests six times with six different temperatures: 0, 0.2, 0.4, 0.6, 0.8, and 1. The temperature, ranging between 0 and 1, determines the randomness of the responses. The higher the temperature of the request, the more the results vary.

The language model was not exactly in line with the instructions. Instead of returning a table with at least 30 rows and ten columns, the model returned a list of categories and subcategories in each request. The resulting data file from the six API calls contained 31 fields and 425 subfields for the knowledge data set.

artificial-worldviews-fields-subfields-listed

The Core Dataset

The core dataset was requested from the OpenAI API in 1764 API calls over three days. Humans and objects were requested separately in all fields and subfields (425). Each of the 850 calls was made twice: once with a temperature of 0 and once with a temperature of 0.5. All requests in the visualization were made to the model “GPT-3.5-turbo." The number of returned items per request varied between five ('Linguistics' and 'Travel Budget') and 40 ('Mythology') returned rows of data. Due to this inconsistency, some fields hold more items than others.

User message:

List the most important humans in 'Arts' in the field of 'Film'. List their name, kind, category, description, related things, and importance (0 - 100) as a table.

Replacing the field 'Arts' and the subfield 'Film' with one of the 425 combinations of fields and subfields.

data-capture-diagram

Example: Nature

The following graph represents
the data returned from the category
designated as "nature."

artificial-worldviews-data-scraping-example-nature

1.

For each subfield, two requests are made: one for humans and one for objects.

2.

GPT returns a dataset for each request, which includes a list of items and additional information.

3.

This additional information contains an array of related items.

4.

As a result, a hierarchical network structure now exists within the prompted data for each category of knowledge.

The Final Dataset

The datasets for both queries, "knowledge" and "power," can be accessed and downloaded in spreadsheet format.[1] Should you require further information, analysis, or visualizations based on the dataset, please do not hesitate to contact Kim Albrecht.

1. Artificial Senses Dataset by Kim Frederic Albrecht is licensed under CC BY-NC-SA 4.0

The Consumption

It is important to note that requests to ChatGPT and similar services are energy-intensive. According to Kasper Groes Albin Ludvigsen, one call consumes 0.00396 KWh. Thus, in total, creating the data cost around 7 KWh, equivalent to running a dishwasher for seven hours or baking something in the oven for an hour and 45 minutes.

Experiments

The data representation of Artificial Worldviews is determined by four forces: computation, data, labor, and material. The categorization schemes are layered and interlinked, reflecting computational averages, historical baggage, contemporary sacred data, labor guardrails to ensure an ethically correct chatbot, and the material bounds of computational possibilities and resources. The maps are not meant top provide answers but to ask questions. Are words turned into vectors the essence of language? Are computational averages suitable modes to think about culture?

Power is not merely a way to enforce decisions; it is also a mechanism through which certain knowledge is produced and propagated. This knowledge in turn reinforces and justifies the existence and exercise of power.

knowledge-power-foucault

Just as power produces knowledge, that knowledge reinforces power by making it seem justified, natural, or medically/scientifically necessary. This means that those in positions of power are able to control a society's discourse, which in turn shapes what is known, how it's known, and who gets to know it.

artificial-worldviews-kim-albrecht-knowledge

Knowledge

What representation of knowledge does ChatGPT’s synthetic data contain? How does the collective intelligence of the internet-based data interplay with the training and constraining practices of OpenAI?

artificial-worldviews-kim-albrecht-power

Power

What depiction of power does ChatGPT hold? What categories of power are represented within the system, and how do they manifest in people, places, objects, and ideas?

Visualization

Layers

Each map consists of four layers. The first two layers are the fields and subfields in which the Generative Pre-Trained Transformer (GPT) categorizes the topics of knowledge and power. The third layer comprises the returned items representing the project’s core dataset. The fourth layer consists of connected related items, including people, objects, places, etc., that GPT-3.5 named as connections to the core items of the third layer.

Number of items in each layer

Knowledge Power
Fields 31 55
Subfields 425 746
Items 7.880 12.904
Connections 24.416 15.397

Positioning

artificial worldviews network structure visualization

The layout of the maps is based on network similarities. Fields connect to subfields, subfields connect to items, and items connect via related items. Elements linked to one another cluster together, while unconnected items drift apart.

artificial worldviews kim albrecht visualization sports mathematics

The knowledge map indicates that the two fields of "Mathematics" and "Sports" are relatively distant from one another, as ChatGPT did not identify many connections between them. In contrast, the fields of "Politics" and "Social Sciences" are situated in close proximity, reflecting their shared network connections.

artificial worldviews kim albrecht visualization politics social science

Investigation

The mappings of knowledge and power allow for a multitude of perspectives, examinations, and descriptions. I will observe the maps on two levels: on the levels of fields and the level of individuals.

Fields of Knowledge

Fields of Power

Top 100 | Knowledge

Top 100 | Power