Data Modeling and Structuring Fields
Structuring a Humanities Dataset
Humanists have a complicated relationship with data. As Christof Schöch, Miriam Posner, and Katie Rawson and Trevor Muñoz, have highlighted, there is a tension between traditional humanistic approaches to studying objects and data-based approaches. Namely, that humanistic inquiry privileges difference and complexity while representing objects as data requires layers of abstraction and imposing uniformity.
One of the main benefits of Airtable, as pointed out by William K. Dewey, is that Airtable's permissive approach to building databases can keep data human-readable. By encouraging semantically meaningful primary fields, allowing many-to-many relationships directly between two tables, and adding lookup fields bring in information form other tables, research teams can easily make a database that facilitates "humanistic engrossment," to borrow a term from Bethany Nowviskie.
One downside to the "human-readable" approach is that the data may not be ideally structured for computational analysis. Certain decisions that may make a dataset human-readable, such as storing multiple variables in one column or keeping multiple observations in the same table, break Tidy Data principles and make it difficult to analyze that data. However, as Matthew Lincoln has written about in relation to Google Sheets, data can always be "tidied" down the road. So if certain "un-tidy" decisions help your team members understand and work with the data, it may be appropriate to break certain conventions upfront.
When determining how to structure your database, consider the aim of your Airtable base:
- Is the base primarily for data storage? Or will our research team be engaging directly with the data in the base?
- Will you be sharing the Airtable base with the public?
- What will your data entry workflow look like?
- Is data being created in Airtable or imported?
Data Modeling
The process by which we determine how to represent our objects of study is called data modeling. This includes deciding what attributes we want to document and what relationships we want them to have. The inverse of this is also true: data modeling determines what objects, attributes, and relationships we won't document. As Johanna Drucker writes, "Almost all data are partial and represent some features of a phenomenon and not others. Policies of inclusion and exclusion operate to reify and reinforce biases, making them seem natural."
There is no single approach to modeling any object or system. The model that you build can be influenced by many factors, including:
- What objects, attributes, or relationships speak to the your research question?
- What data structure do you intend to use (e.g. relational database, XML, flat database)?
- What data do you have access to?
- Are there any controlled vocabularies you can use to describe attributes?
- Are there existing data models you can base your data collection on?
The data modeling process can be complex and, at times, intimidating. It deservers more attention than I can provide in this tutorial, so I encourage you to check out resources in the further reading section.
Book Database: Creating Fields
Let's return to our example database of sound studies books. Conveniently, we can start with the data model provided to us by Zotero, which is structured as a flat database. In this model, each book
is an individual records, but the books
don't have defined relationships with each other or the entities that created them (we'll add more complexity to the model as the tutorial continues).
The original Zotero export contained more than 50 fields – including many blank fields – so we will want to pair down the fields to a more workable subset. Knowing that we eventually want to create relationships between books and their creators, I've decided to keep fields that potentially have overlapping relationships. The final set of fields we'll work with are: title
, author
, editor
, year
, publisher
, series
and place
.
Before adding fields to Airtable, we also have to determine the appropriate field type for each attribute. This is an important step for data validation, as it ensures that all shared attributes are uniformly structured. This is particular import for allowing dates and numbers to be compared, aggregated, and analyzed.
Airtable Field Types
While we'll be only be using a limited subset of field types for this tutorial, Airtable supports the following:
- Attachment fields
- Date-based fields
- Formula fields
- Linked-Record fields
- Number-Based fields
- Rollup, Lookup, and count fields
- Single and Multiple Select fields
- User Fields
Assigning Types
Here is a table of our example fields broken down by field type, any additional arguments, and description:
Name | Airtable Field Type | Additional Arguments | Description (from Zotero) |
---|---|---|---|
title |
Single line text (primary field) | The principal title of an item. Should be entered in sentence case. | |
id |
Formula | RECORD_ID() |
Unique ID assigned by Airtable. |
author |
Single line text | The principal author or creator of a work. Enter authors (and other creators) in the order that they should be cited. | |
editor |
Single line text | The editor of an item or the broader publication an item is part of (e.g., book, journal). | |
year |
Number | Decimal Places: 0 ; Thousands separator: false |
Year of publication. |
publisher |
Single line text | The publisher of an item. | |
series |
Single line text | Name of a series that contains multiple publications or presentations. | |
place |
Single line text | The place of publication for an item. |
Adding Fields to the Table
Now that we have a model for our data, you can return to the Sound Studies Books base, open the books
table, and add these fields. This can be done by clicking the + icon to the right of the last field and setting field preferences in the popup.
Your Airtable base should now have one table, one grid view, and eight fields:
Further Reading
Humanities Data
- Schöch, Christof. “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities 2, no. 3 (Summer 2013).
- Posner, Miriam. “Humanities Data: A Necessary Contradiction.” Miriam Posner’s Blog (blog), June 25, 2015.
- Rawson, Katie, and Trevor Muñoz. “Against Cleaning.” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein. University of Minnesota Press, 2019. https://doi.org/10.5749/j.ctvg251hk.
Data Modeling
- Flanders, Julia, and Fotis Jannidis. “Data Modeling in a Digital Humanities Context: An Introduction (preprint).” In The Shape of Data in Digital Humanities: Modeling Texts and Text-Based Resources. London: Routledge, 2018.
- Nowviskie, Bethany. “Capacity through Care.” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein, 424–26. University of Minnesota Press, 2019. https://doi.org/10.5749/j.ctvg251hk.40.
- Drucker, Johanna. “Data Modeling and Use.” In The Digital Humanities Coursebook: An Introduction to Digital Methods for Research and Scholarship, 1st ed., 19–33. First edition. | Abingdon, Oxon ; New York : Routledge/Taylor & Francis, 2021.: Routledge, 2021. https://doi.org/10.4324/9781003106531.
- Ciula, Arianna, Øyvind Eide, Cristina Marras, and Patrick Sahle. Modelling Between Digital and Humanities: Thinking in Practice. Open Book Publishers, 2023. https://doi.org/10.11647/obp.0369.
Tidy Data
- Wickham, Hadley. “Tidy Data.” Journal of Statistical Software 59 (September 12, 2014): 1–23. https://doi.org/10.18637/jss.v059.i10.
- Lincoln, Matthew D. “Tidy Data for the Humanities.” Matthew Lincoln, PhD (blog), May 26, 2020.
- Library Carpentry. “Tidy Data for Librarians (Lesson).” Accessed May 2, 2025. https://librarycarpentry.github.io/lc-spreadsheets/.
Airtable Bases
- Airtable Support. “Fields Overview.” Accessed May 6, 2025.
- Airtable. “6 Common Airtable Design Decisions.” Accessed May 9, 2025.
This tutorial is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).