On the Contingency of the Digital Archive

Tom J. Abi Samra | Aug. 18, 2020

Abu Hayyan al-Tawhidi, the famous 10th-century Abbasid litterateur, cites in his Akhlaq al-Wazirayn (The Morals of the Two Viziers) what Dhu al-Kifayatayn wrote of Ibn Abbad’s letter to him. Dhu al-Kifayatayn says,

Ibn Abbad left our city, al-Rayy, toward Isfahan, with an anticipated stop in Waramin. Instead, he went a bit further and stopped at a village inundated with salt water. His only purpose was to write to us: “I write this letter from an-Nawbahar on Saturday at midday” [Kitabi hadha min an-Nawbahar yawm as-sabt nusf an-nahar].*

What this letter is telling us is that Ibn Abbad’s purpose for entering this barren village is to write rhyming prose (“Nawbahar” and “nahar”), known in Arabic poetics as saj‘.

This scene speaks to many things: classical Arabic poetics, the importance of linguistic aesthetics, etc. After all, it is with this scene that Moroccan writer and scholar Abdelfattah Kilito opens his book “Bi hibr khafiyy” (With Invisible Ink, 2018). What I see in this scene, however, is an encounter between chance and intention. Upon being presented with the option to go through Nawbahar, Ibn Abbad decides to take advantage of this happenstance, go to the city, and write his letter to Dhu al-Kifayatayn — an active response to a fortunate, albeit (semi-)random, encounter.

In many ways, it is within this framework that archival work seems to function, at least as far as I’ve experienced it. It is in this encounter, or dialectic, between happenstance and intention that something of value emerges from the chaos of the archive, especially if it is unorganized whatsoever. It is fair to assume that one does not simply find oneself in an archive: one intends to visit a certain archive for a certain reason — they might know what exactly they are looking for, or, at the very least, have an idea of what the archive has to offer. That said, once the researcher is within it — once they inhabit it — one thing leads to another, and they end up encountering certain things and not others. Although one might argue that the possibilities in the archive are endless, I contend that with every next step one takes in the archive, one simultaneously forgoes a certain N possibilities. This is a mathematical argument that I do not wish to get into, but you can trust me on it.

What is it like working in an online archive? What are the limitations, or contingencies, of online archival work? What does online archival work afford the researcher, and does it take away, vis-à-vis the physical archive?

While the material conditions that condition physical archival research seem obvious, especially to the researcher, the tool that is the internet makes it feel like we have access to anything and everything, such that the conditioning power of the digital seems to be overlooked precisely because of its non-materiality — i.e., its digitality. However, this is but a mirage: we in fact are conditioned by the (online) archive. What are these conditioning material conditions exactly? Why does it matter that we are made aware of the contingency of the online archive specifically?

In “Code/Space: Software and Everyday Life,” Rob Kitchin and Martin Dodge discuss the active role of software in our daily lives — how software has agency. They write,

For us, software needs to be theorized as both a contingent product of the world and a relational producer of the world. Software is written by programmers, individually and in teams, within diverse social, political, and economic contexts. The production of software unfolds — programming is performative and negotiated and code is mutable.… software needs to be understood as an actant in the world — it augments, supplements, mediates, and regulates our lives and opens up new possibilities — but not in a deterministic way. Rather, software is afforded power by a network of contingencies that allows it do work in the world.

From Kitchin and Dodge, we learn to view software as not only a product of human innovation but also as a tool that conditions human behavior and innovation — a modern, jargonic way of phrasing the relationship between chance and intention that I read in Tawhidi’s anecdote. In most abstract terms, it is a to-and-fro between human activity and passivity.

This understanding of software should bear upon our understanding of the online archive. We ought to see beyond the seemingly objective, all-encompassing view of the internet.

Finding Something to Translate

Last winter break, I wanted to translate an explicitly feminist text from Arabic. Instantly, I found myself gleaning the internet for something I could translate — preferably something that is open accessa broad international movement that seeks to grant free and open online access to academic information so that I don’t have to deal with copyright issues. My mode of research, basically scouring the internet, led me quite quickly to May Ziadeh, a Lebanese feminist who was not translated. I accessed her work on hindawi.org, a website for open access Arabic texts. Soon enough, after flipping through a few pages, I realized that her feminism is subtle, subdued; it was not what I was looking for. The suggestions tab on the website, however, recommended other texts/authors worth considering, including Hoda Shaarawi. The website only had two texts of hers, her diaries and a speech. The former has been translated, but the latter seemed to have not been. What’s more, the latter was undated and ‘unsituated’ — it was unclear when and where she gave the speech. After a few hours scouring online library databases and search engines, I concluded that the only way to know for sure is to have access to the whole archive of the magazine that Shaarawi edited during her lifetime. However, none of her magazine’s issues were digitally archived and available online.

One thing led to another, and I eventually found yet another feminist, Doria Shafik, whom — I am glad to say — I translated. Some scans of the feminist magazine she edited in Egypt — Bint al Neel (Daughter of the Nile) — were made available by the American University in Cairo Libraries. The interface was clunky, but I eventually figured out navigating it; it was especially hard to filter out articles Shafik herself had written from other articles in the magazine and made available online — a process which would’ve been much simpler with the physical copies of the magazine at hand.

The Classification of the Online Database

A lot of this process would be different in a library or physical archive (the difference between the two does not concern us much here). But what I wish to comment on is this “suggestions tab,” which took me from May Ziadeh to Hoda Shaarawi. The analog version of this tab would be looking for books in physical proximity if the library is classified according to the Dewey Decimal Systemorganizing the contents of a library based on the division of all knowledge into 10 groups with 100 numbers each . If I go to a library and look for May Ziadeh’s writings, I would be able to find similar writings in the adjacent shelves, i.e., writings of the same theme, genre, etc. While this is helpful, and has proven helpful in the past, this is nothing close to what a recommendation algorithm in a website can do. In addition to relying on metadata or classification to offer suggestions — which is basically how Dewey classification works — software is able to gather aggregate user data to offer more specific suggestions based on, for example, what other works users viewed in conjunction with Ziadeh’s. If in the library I encounter other Arabic feminist writings next to Ziadeh’s, an online database may suggest, for example, studies on Ziadeh, works by Ziadeh’s interlocuters, and other first-wave feminist works that I wouldn’t have known of or considered otherwise.

It is in moments like these that the software/human dialectic is in full swing. The software, designed by humans, and database/archive classified by humans, in turn learns from humans and aids them in gleaning the excess that is aggregate human knowledge. But at the same time, it is important to point out the side effect of such a process that is relatively mediated and guided. While such suggestions may speed up the research process, being presented with concrete suggestions — vis-à-vis a more wandering mode of browsing in a library — in a sense limits the researcher’s possibilities. Presented with many results, the researcher does not feel obliged to dig deeper, to look further, for she is already presented with an excess of alternatives. The problem with not digging deeper, however, is that this excess of search results is not neutral, but may be based on one or more of the following: metadata entered in the library databases, other users’ browsing history within the subject, the language and/or script in which one is searching, and more. In other words, we have algorithmic bias (whose consequences we will discuss later) that creeps into the research stream. So how does one deal with such constraints? What can we learn from this speculation?

It is worth noting that algorithmic bias is not a hitherto undiscussed field. As Safiya Umoja Noble notes in “Algorithms of Oppression,”

Part of the challenge of understanding algorithmic oppression is to understand that mathematical formulations to drive automated decisions are made by human beings. While we often think of terms such as “big data” and “algorithms” as being benign, neutral, or objective, they are anything but.

That said, there is little that is discussed about algorithms of oppressions’ relationship to fields that are not primarily concerned with software and algorithms, like, in our case, the humanities. Thus, this essay is an attempt (I hope one of many) at putting this scholarship, stemming from media and internet studies, in conversation with research practices in the humanities.

Moving from Internet as Object to Internet as Thing

Resuscitating Kitchin and Dodge,

Software then is not an immaterial, stable, and neutral product. Rather, it is a complex, multifaceted, mutable set of relations created through diverse sets of discursive, economic, and material practices.

From this quote, code — the language of software — loses its rigidity and luster; it is no longer infallible. Trust in the power of the computer fades. Ultimately, if humans write the code, and humans are prone to error, then the code cannot be perfect. Although this may sound didactic, it is important to articulate in the context of the digitization of archives and development of online libraries. It is easy for researchers or scholars working with archives to assume that the “computer knows more”; however, given this is not always true, continuously working towards understanding how algorithms work and improving them is key to developing more robust research output. The task of the coders seems clear, namely to “work through, understand, and modify the code by reading and playing with it” — a task that is facilitated by conversations with and feedback from researchers who use such databases (which is why an algorithm can do a lot, even if it doesn’t in its current state — it has the potential to be improved). Taking a few steps back, however, consciousness about how software works is crucial for the researcher relying on digital research tools. For technical literacy will ensure the researcher does not become subordinate to the software. In other words, acknowledging the agency of the computer, and its limitations, proves crucial in making the most out of it, even if it seems that a seamless experience would be the goal.

Robin Bernstein, albeit in a completely different context, elucidates the difference between “things” and “objects,” a discussion that I think further illuminates the argument at hand. She writes,

Things, but not objects, script actions. . . . an object [is] . . . a chunk of matter that one looks through or beyond to understand something human. A thing, in contrast, asserts itself within a field of matter. For example, when an amateur cook uses a knife to chop an onion, the knife might function as an object that the amateur barely notices; in this scenario, the knife is only a tool used to obtain the chopped onion that the human desires. For a trained chef, however, a knife can never be an object: for such a person, each edge of a knife glitters individually with potential and stubbornness, with past, present, and future motions of slicing and chopping. The trained chef’s knife is thus a thing with which a chef negotiates, while an amateur’s knife is an object to the extent that it is only a means to an end. If the amateur’s knife should slip and cut a finger, however, that knife suddenly becomes a thing that has leapt up and asserted itself, a thing that demands to be reckoned with. The difference between objects and things, then, is not essential but situational and subjective.

Thinking with Bernstein, the thing/object at hand in our case is the online database or archive as well as the user interface. And I am arguing for a researcher who is like the trained chef — seamless, not boggled down by the thing, but at the same time aware of its limitations, processes — in other words, aware of how it works. For if this software remains an object for the researcher, she will be limited by it, unable to see its limitations, what it cannot do, and instead fascinated by what it can do.

Bringing this conversation into the realm of software, David Berry theorizes something he calls a “phenomenology of software.” Berry writes about his worry concerning our unconscious over-reliance on software. He writes,

I want to suggest that what is happening in the ‘digital age’ is that we increasingly find a computational dimension inserted into the ‘given’. . . . For example, ‘for writing’ increasingly becomes ‘for the processing of words towards-which a final document is output’, or ‘going in and out’ becomes ‘the exit or entrance towards-which one is involved in a process of attending to or withdrawing from a particular process’.

My issue is not so much with the fact that “‘for writing’ increasingly becomes ‘for the processing of words towards-which a final document is output,’” but rather that we are oftentimes unaware of this substitution. In fact, I would argue that knowledge of one’s output — knowledge of this substitution — is crucial, but when we are unaware that software mediates our research happenstances, when we forget that the computer is not an objective entity unconditioned by material reality, its circumstances of production, and its producers themselves, we are at risk of not doing our best research yet, and what is supposed to support us works against us.

Concluding Thoughts

This essay is not merely a call for cynicism about research tools and methods. Instead, it is a call for increased awareness about these tools’ limitations. Whatever the digital research tool might be — the internet, online archives, digital search engines — it is not perfect, and just like any other tool, has its limitations. While the suggestions algorithm helped me come across a feminist author I may have not encountered using other research tools, it also made me take a step back and reflect on the online/digital research process. Like any technological development, digital/online databases and archives have increased our research speed and capacity, and we must make use of these tools, while at the same time being cognizant of their structures, limitations, and biases.

As much as I’d like to think that being part of the computer-native generation makes me attuned to how software works such that software for me is a thing rather than an object, I still have a bunch of unanswered questions about online libraries, archives, databases, etc. I hope that this article starts a much-needed conversation among students, scholars, and researchers who, if they are not aware of how algorithms and databases work, are victims of this seemingly neutral tool.


We believe in furthering the conversation on digital archiving beyond this article. Here are a few resources from Tom to get that started:

1. The Philosophy of Software: Code and Mediation in the Digital Age | David M. Berry

2. Code/Space: Software and Everyday Life | Rob Kitchin and Martin Dodge

3. “Software as Co-Creator in Interactive Documentary,” in I-Docs: The Evolving Practices of Interactive Documentary | Craig Hight

4. Protocol: How Control Exists After Decentralization | Alexander R. Galloway

5. Algorithms of Oppression: How Search Engines Reinforce Racism | Safiya Umoja Noble

6. What Does “Born Digital” Mean? | Elias Muhanna

---

* An earlier version of this article mistranslated this quote. The author would like to thank Professor Maurice Pomerantz for pointing out and correcting the error in the translation.