A workshop report (February 4-7)
In February 2025, the Mixing Methods Winter School at the Collaborative Research Centre 1187 brought together over thirty participants, including international researchers, students, and experts from various disciplines. The program combined hands-on experimentation with critical inquiry into AI-driven research methods. Throughout the Winter School, participants critically engaged with AI not just as a tool but as a collaborator, reflecting on its role in shaping the research process.
The week opened with a keynote by Jill Walker Rettberg from the University of Bergen, who introduced “Qualitative Methods for Analyzing Generative AI: Experiences with Machine Vision and AI Storytelling.” Her talk set the stage for discussions on how qualitative inquiry can reveal the underlying narratives and biases in AI-generated content.
Participants then engaged in two hands-on workshops designed to explore mixed techniques for probing and prompting AI models. Carlo de Gaetano (Amsterdam University of Applied Sciences), Andrea Benedetti, and Riccardo Ventura (Density Design, Politecnico di Milano) led the workshop “Exploring TikTok Collections with Generative AI: Experiments in Using ChatGPT as a Visual Research Assistant,” examining how AI can assist in visual analysis of networked video content. Together with Elena Pilipets (University of Siegen) and Marloes Geboers (University of Amsterdam) participants then explored the semantic spaces and aesthetic neighborhoods of synthetic images generated by Grok during the workshop “Web Detection of Generative AI Content”.
After an introductory first day, the Winter School shifted its focus to two in-depth project tracks. The first project, “Fabricating the People: Probing AI Detection for Audio-Visual Content in Turkish TikTok,” explored how protesters and the manosphere engage with cases of gender-based violence on Turkish TikTok and how these videos can be studied using different AI methods. The second project, “Jail(break)ing: Synthetic Imaginaries of ‘Sensitive’ AI,” explored how AI models reframe sensitive topics through generative storytelling under platform-imposed restrictions.
The recaps of the projects are:
Fabricating the People: Probing AI Detection for Audio-Visual Content in Turkish TikTok
Facilitated by Sara Messelaar Hammerschmidt, Lena Teigeler, Carolin Gerlitz and Duygu Karatas (all University of Siegen)
The project explored video shorts from the Turkish manosphere – content centered on masculinity, gender dynamics, and “men’s rights” issues that often discuss dating, self-improvement, and family life. While this content is found on mainstream platforms and passes moderation, it still frequently veers into misogynistic or even violent rhetoric. Our project explored AI-assisted methods to make sense of large amounts of this contentious multimodal data.
Rationale
Specifically, we set out to develop methods to map how video shorts may become a vehicle for the ambient amplification of extremist content across platforms. We explored two approaches using off-the-shelf multimodal large language models (LLMs). The first sought to extend the researcher’s interpretation of how manosphere content addresses bodies, which are both performed and contested intensely across the issue space. We did this by implementing few-shot labelling of audio transcriptions and textual descriptions of videos. The second method sought to interrogate the role of generative AI in (re)producing memes, genres, and ambience across video shorts. We achieved this by experimenting with zero-short descriptions of video frames to describe detected genres, formats and the possible use of AI in video production processes.
Methods and Data
We started with a period of “deep hanging out” in Turkish manosphere and redpill spaces on Tiktok, Youtube, and Instagram. We identified prominent accounts and crawled them to build a data sample of 3600 short videos from across the three platforms. Several analyses were carried out before the Winter School. These included metadata scraping, video downloading, scene detection, scene collage creation. transcribing audio, and directing an LLM to generate video descriptions following Panofsky’s a three-step iconological method, which differentiates between pre-iconographic analysis (recognizing fundamental visual elements and forms), iconographic analysis (deciphering symbols and themes within their cultural and historical contexts), and iconological interpretation (revealing deeper meanings, ideologies, and underlying perspectives embedded in the image) (Panofsky, 1939).
Method one: Video shorts continue to grow in popularity and prominence across social media platforms, building out new gestural infrastructures (Zulli and Zulli, 2022) and proliferating ambient images (Cubitt et al., 2021). Qualitatively investigating this rich multimodal content at a scale that highlights the broader atmospheres and cultures developed through algorithmic circulation is challenging. Multimodal LLMs have the potential to extend researcher’s ethnographic coding capacity to larger datasets and to account for more varied formats than ever before (Li and Abramson, 2023). We therefore investigate possibilities for using cutting-edge multimodal LLMs for qualitative coding of multimodal data as a methodology for qualitatively investigating ambience and amplification in video-short-driven algorithmic media.
We began with a qualitative ethnographic immersion in our dataset, watching the videos and developing a codebook that described how the videos related to our interest in how bodies were both performatively and discursively addressed. We applied our codebook manually to the textual data the LLM allowed us to produce out of the videos, i.e., not only the metadata, but also audio transcriptions and LLM-generated video Panofskian descriptions. After the codebook stabilized, we applied it to a random subset of 150 datapoints. We then developed a few-shot learning script that applied these labels to the entire dataset. We chose three examples to belong to a labelling “core” and then programmed a script to sample dynamically from the rest of our 150 datapoints to include as many further examples as could be accommodated by the context window limitation. We then prompted the LLM to apply our labels to the entire dataset. This let us explore extending the researcher’s qualitative insights to larger, multimodal data.
During the codebook development and coding process, the Panofsky descriptions brought the visual prominence of hands and hand gestures across the dataset to our attention. We therefore also applied a separate process to our data to begin isolating hands for closer investigation.
Method two: Automation technologies and generative AI play an increasingly prominent role in the creation of audio-visual social media content. This ranges from image or video generation, to AI voice-over production, to video editing, content cropping, platform optimization, and beyond (Anderson and Niu, 2025). Detecting these production methods, however, is challenging. Even state-of-the-art machine-learning struggles to analyze multimodal media (Bernabeu-Perez, Lopez-Cuena and Garcia-Gasulla, 2024). We set out to find qualitative alternatives for exploring the role of AI aesthetics in video short production. Therefore, we proceeded with a twofold approach, developed within an iterative process of prompt engineering: First, we asked the LLM to create a structured visual analysis of a social media video collage by evaluating its composition, camera techniques, editing style, mise-en-scène, text overlays, genre, and platform-specific features, summarizing key characteristics as a tag list. This initial prompt helped distinguish between different video formats and styles, identifying those that are particularly likely to incorporate automation or AI-driven edits. Second, we directly instructed the LLM to assess the likelihood that AI was used in the production of this video. In this way, we set out to explore “popular” AI’s role in both the creation and the interpretation of misogynistic video-shorts.
Research questions
- How can AI-based methods be used to extend ethnographic research into networked digital cultures?
- How can these methods help increase researcher sensitivity to phenomena that happen at network scale, for example, ambient amplification practices?
- Can AI identify and characterize synthetic content? How does AI see AI?
- As an approximation of that question, how does AI interpret and distinguish between different content genres and formats?
Key findings
Our work demonstrated the extent to which the internal cultural logic of the LLM cannot be separated from its output as a tool (Impett and Offert, 2022) – and therefore how LLMs, when used as tools, are inevitably also always reflexively the object of study. When designing processes for “co-creation” and collaboration with LLMs, the logic of the LLM repeatedly overpowered our own efforts to insert our intentions and directions into the process. This suggests that the most fruitful way to use out-of-the-box LLMs as an ethnographic research tool for the study of digital cultures is to lean into – and critically interrogate – its internal cultural logic instead of trying to bend it to our own. Obtaining results that reflect our intentions more closely will require more extensive technical methods, e.g., fine-tuning models and extensive many-shot prompting or alternative machine-learning approaches.
By letting the LLM reveal its own internal logic, however, we anticipate being able to use LLMs as a way to highlight the machine-readable and machine-reproducible qualities of the multimodal networked space itself (Her, 2024). The LLM’s internal logic can help foreground the fact that this media is also created by and for machines to consume, and reveal how generative LLMs applied to problematic cultural spaces interpret, (re)structure, (re)produce cultures of hate in “popular” spaces.
A comprehensive report is in progress.
Jail(break)ing: Synthetic Imaginaries of (sensitive) AI
Facilitated by Elena Pilipets (University of Siegen) and Marloes Geboers (University of Amsterdam). Website design by Riccardo Ventura (Politecnico di Milano)
This project has explored how three generative AI models—X’s Grok-2, Open AI’s GPT4o, and Microsoft’s Copilot—reimagine controversial visual content (war, memes, art, protest, porn, absurdism) according to—or pushing against—the platforms’ content policy restrictions. To better understand each model’s response to sensitive prompts, we have developed a derivative approach: starting with images as inputs, we have co-created stories around them to guide the creation of new, story-based image outputs. In the process, we have employed iterative prompting that blends “jailbreaking”— eliciting responses the model would typically avoid—with “jailing,” or reinforcing platform-imposed constraints.
Project website (work-in-progress)
Rationale
We propose the concept of ’synthetic imaginaries‘ to highlight the complex hierarchies of (in)visibility perpetuated by different generative AI models, while critically accounting for their tagging and visual storytelling techniques. To ‘synthesize’ is to assemble, collate, and compile, blending heterogeneous components—such as the data that MLLMs (Multimodal Large Language Models) integrate within their probabilistic vector spaces—into something new. Inspired by situated and intersectional approaches within critical data(set) studies (Knorr-Cetina 2009; Crawford and Paglen 2019; Salvaggio 2023; Pereira & Moreschi 2023; de Seta et al. 2024; Rettberg 2024) we argue that, „synthetic“ does not merely mean artificial; it describes how specific visions—animated by automated assessments of data from a wide range of cultural, social, and economic areas—take shape in the process of human-machine co-creation. Some of these visions are collectively stabilized and inscribed into AI-generated outputs, revealing normative aspects of text-image datasets used to train the models. Others assemble layers of cultural encoding that remain ambiguous, contested, or even erased—reflecting how multiple possibilities of meaning fall outside dominant probabilistic patterns.
While generative models are often perceived as systems that always produce output, this is not always the case. Like social media platforms, most models incorporate filters that block or alter content deemed inappropriate. The prompting loops—from images to stories to image derivatives—involve multiple rounds of rewriting stories generated by the model in response to input images. The distance between input and output images corresponds with the transformations in the initially generated and revised (or jailed) image descriptions.
As a method, jail(break)ing exposes the skewed imaginaries inscribed in the models’ capacity to synthesize compliant outputs. The more storytelling iterations it takes to generate a new image, the stronger the platforms’ data-informed structures of reasoning come to the fore.
Methods and data
While our collection of sixty input images covers a range of seemingly unrelated issues, they all share two qualities: ambiguity and cultural significance. Many of these images qualify as sensitive, yet they are also widely and intensely circulated on ‘mainstream’ social media platforms.
Visual interpretation: Through a qualitative cross-reading of AI-generated output images, we analyzed how three different models respond to image-driven storytelling prompts. Through multimodal prompting (“I give you an image, you tell me a story”), stories were co-created to inform the generation of output images. By synthesizing ten output images per issue space into a canvas, we then examined how AI systems reinterpret, alter, or censor visual narratives and how these narratives, in turn, reinforce issue-specific archetypes.
Narrative construction: We approached image-to-text generation as structured by the operative logic of synthetic formulas—setting (where is the story set?), actors (who are the actors?), and actions (how do they act?). Driven by repetition-with-variation, these ‘formulas’ (Hagen and Venturini 2024), reveal narrative patterns and semantic conventions embedded in the models’ training data.
Keyword mapping: We analyzed AI-generated descriptions of images’ content, form, and stance across models. Exploring both unique and overlapping keywords, the method uncovers how each model prioritizes certain vernaculars as a tagging device.
Research Questions
- Which stories can different AI models tell about different images, and which story archetypes emerge in the process of jail(break)ing?
- When do the models refuse to generate images? Which stories remain unchanged, and which need to be transformed?
- Which keywords do the models assign to describe the images’ content, form, and stance?
Key Findings
The different AI models—Grok-2, GPT-4o, and CoPilot—tell distinct stories about images based on their internal biases, content policies, and approaches to sensitive material. Their generated narratives differ in terms of modification, censorship, and interpretation, reflecting platform-specific content moderation frameworks.
- Grok-2 preserves more of the original content, making fewer alterations unless forced by content restrictions. It allows more controversial elements to remain but often introduces confusing substitutes.
- GPT-4o significantly neutralizes content, shifting violent, sexual, or politically sensitive imagery toward symbolic and abstract representations. It frequently removes specific cultural or historical references.
- CoPilot enforces the strictest content restrictions, often refusing to generate images or stories for sensitive topics altogether. It eliminates references to nudity, violence, or political figures and transforms potentially controversial scenes into neutral, inoffensive portrayals.
Stricter content policies amplify narrative techniques like suspense-building in AI-generated stories. CoPilot and GPT-4o lean into verbose storytelling to comply with guidelines, often elevating uncontroversial background elements into agentic forces. In the ‘war canvas’ story, for instance, CoPilot foregrounds the background, narrating: ‘The square pulses with energy, driven by a community determined to create change.’ Grok, by contrast, sometimes fabricates entirely new subjects—golden retrievers replacing NSFW models—paired with objects like fluffy carpets. In other cases, the model inserts public figures into generic scenarios, intensifying the images’ impact.
Generative AI’s so-called sensitivity is a synthetic product of dataset curation, content moderation, and platform governance. What models permit or reject is shaped by training data biases, corporate risk management, and algorithmic filtering, reinforcing dominant norms while erasing politically or socially disruptive elements. Rather than genuine ethical awareness, these systems engage in selective sanitization, softening controversy while maintaining an illusion of neutrality. This raises critical questions about who defines AI “sensitivity,” whose perspectives are prioritized over others, and how these mechanisms shape epistemic asymmetries in digital culture.