Artificial Worldviews

How will »prompting« alter our perception?
What types of aesthetics will large language models bring to the world?
In what ways will technologies like ChatGPT affect notions, principles,
and styles for the coming decade?

Artificial Worldviews is a series of inquiries into the system underlying ChatGPT about its descriptions of the world. Utilizing prompting, data gathering, and mapping, this project investigates the dataframes of »artificial intelligence« systems.

Artificial Intelligence and Machine Learning methods are often referred to as black boxes, indicating that the user cannot understand the inner workings. However, this trait is shared by all living beings: we come to know a person not by examining their brain structures but by conversing with them. The so-called black box is not impenetrable since we can gain an understanding of its inner workings by interacting with it. Through individual inquiries, we can only acquire anecdotal evidence of the network. However, by systematically querying chatGPT's underlying programming interface, we can map structures of the system.

In my research, I methodically request data about large-scale, indefinable human concepts and visualize the results. These outputs visualize expansive data structures and unusual, sometimes unsettling worldviews that would otherwise be unimaginable. The terms »power« and »knowledge« unfold vast discourses from philosophy, politics, social sciences to natural sciences, they hold multidimensional meanings within social relations. The resulting graphics resemble narratives found in the works of Franz Kafka or Jorge Luis Borges, representing an infinite library of relational classifications, bureaucratic structures, and capricious mechanisms of inclusion and exclusion.

Foundations

What we call »AI« is a set of nested noxious operations.

triangle of large language models.

Computational Averages

At the computational level, large language models, a class of deep learning architectures, are operational modes for transforming words and word fragments, called tokens, into high-dimensional relational vectors. By making a query in the ChatGPT interface, the system finds the most likely next term for the given tokens. The result is a constant feedback calculation of the most likely, most average next term based on the given model.[1]

Stolen & Harmful Data

Two billion tokens on which GPT-3.5 is trained come from Wikipedia. The vast majority of the training data, 410 billion tokens, comes from a project called Common Crawl. In September/October 2023 alone, Common Crawl archived 3.40 billion web pages. This means that if you have posted something on the web, on social media, or on your own website, that content is most likely part of the dataset.[2]

ChatGPT is us.

But people who post things online are far from the best possible source of data. The Internet is certainly not filled with the purest knowledge of humanity. Bullying, trolling, stalking, crime, spam, pornography, violence, hate speech, radicalism... To name just a few categories.

In addition, this data comes with historical baggage. The data that ChatGPT is trained on by design comes from the past. As part of the project, I asked ChatGPT to categorize people by 'race'. The system never complained that this might not be an appropriate request. Rather, it returned a racist categorization scheme. Including the term Caucasian, which is a collective racial term that clearly should not be used. To what extent do we become trapped in harmful historical norms by embedding these systems in our lives?

Abusive Labour

To combat the malicious content that ChatGPT is trained on, its parent company OpenAI hired the San Francisco-based company Sama to label the dataset. Sama employs workers in Kenya, Uganda, and India for less than two dollars an hour to read texts about child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest. The ethical implications of ChatGPT and other AI systems is brutally set by the global south.[3]

Environmental Destruction

Training machine learning models and deploying AI systems can be energy-intensive. The capabilities of these systems depend not only on the data but also on the amount of energy and resources used in computation. Unfortunately, there are no official figures on the environmental impact of training or deploying models, but estimates suggest that the impact is significant.

Experiments

The data representation of Artificial Worldviews is determined by four forces: computation, data, labor, and material. The categorization schemes are layered and interlinked, reflecting computational averages, historical baggage, contemporary sacred data, labor guardrails to ensure an ethically correct chatbot, and the material bounds of computational possibilities and resources. The maps are not meant top provide answers but to ask questions. Are words turned into vectors the essence of language? Are computational averages suitable modes to think about culture?

artificial-worldviews-kim-albrecht-knowledge

Knowledge

What kinds of knowledge exist? Which people, places, objects, and ideas matter to these fields of knowledge?

artificial-worldviews-kim-albrecht-power

Power

What is power, in what forms does it exist, and who or what holds that power?

Data Collection

data-capture-diagram