Data Modeling and Structuring Fields

Structuring a Humanities Dataset

Humanists have a complicated relationship with data. As Christof Schöch, Miriam Posner, and Katie Rawson and Trevor Muñoz, have highlighted, there is a tension between traditional humanistic approaches to studying objects and data-based approaches. Namely, that humanistic inquiry privileges difference and complexity while representing objects as data requires layers of abstraction and imposing uniformity.

One of the main benefits of Airtable, as pointed out by William K. Dewey, is that Airtable's permissive approach to building databases can keep data human-readable. By encouraging semantically meaningful primary fields, allowing many-to-many relationships directly between two tables, and adding lookup fields bring in information form other tables, research teams can easily make a database that facilitates "humanistic engrossment," to borrow a term from Bethany Nowviskie.

One downside to the "human-readable" approach is that the data may not be ideally structured for computational analysis. Certain decisions that may make a dataset human-readable, such as storing multiple variables in one column or keeping multiple observations in the same table, break Tidy Data principles and make it difficult to analyze that data. However, as Matthew Lincoln has written about in relation to Google Sheets, data can always be "tidied" down the road. So if certain "un-tidy" decisions help your team members understand and work with the data, it may be appropriate to break certain conventions upfront.

When determining how to structure your database, consider the aim of your Airtable base:

Is the base primarily for data storage? Or will our research team be engaging directly with the data in the base?
Will you be sharing the Airtable base with the public?
What will your data entry workflow look like?
Is data being created in Airtable or imported?

Data Modeling

The process by which we determine how to represent our objects of study is called data modeling. This includes deciding what attributes we want to document and what relationships we want them to have. The inverse of this is also true: data modeling determines what objects, attributes, and relationships we won't document. As Johanna Drucker writes, "Almost all data are partial and represent some features of a phenomenon and not others. Policies of inclusion and exclusion operate to reify and reinforce biases, making them seem natural."

There is no single approach to modeling any object or system. The model that you build can be influenced by many factors, including:

What objects, attributes, or relationships speak to the your research question?
What data structure do you intend to use (e.g. relational database, XML, flat database)?
What data do you have access to?
Are there any controlled vocabularies you can use to describe attributes?
Are there existing data models you can base your data collection on?

The data modeling process can be complex and, at times, intimidating. It deservers more attention than I can provide in this tutorial, so I encourage you to check out resources in the further reading section.

Book Database: Creating Fields

Let's return to our example database of sound studies books. Conveniently, we can start with the data model provided to us by Zotero, which is structured as a flat database. In this model, each book is an individual records, but the books don't have defined relationships with each other or the entities that created them (we'll add more complexity to the model as the tutorial continues).

The original Zotero export contained more than 50 fields – including many blank fields – so we will want to pair down the fields to a more workable subset. Knowing that we eventually want to create relationships between books and their creators, I've decided to keep fields that potentially have overlapping relationships. The final set of fields we'll work with are: title, author, editor, year, publisher, series and place.

Before adding fields to Airtable, we also have to determine the appropriate field type for each attribute. This is an important step for data validation, as it ensures that all shared attributes are uniformly structured. This is particular import for allowing dates and numbers to be compared, aggregated, and analyzed.

Airtable Field Types

While we'll be only be using a limited subset of field types for this tutorial, Airtable supports the following:

Assigning Types

Here is a table of our example fields broken down by field type, any additional arguments, and description:

Name	Airtable Field Type	Additional Arguments	Description (from Zotero)
`title`	Single line text (primary field)		The principal title of an item. Should be entered in sentence case.
`id`	Formula	`RECORD_ID()`	Unique ID assigned by Airtable.
`author`	Single line text		The principal author or creator of a work. Enter authors (and other creators) in the order that they should be cited.
`editor`	Single line text		The editor of an item or the broader publication an item is part of (e.g., book, journal).
`year`	Number	Decimal Places: `0`; Thousands separator: `false`	Year of publication.
`publisher`	Single line text		The publisher of an item.
`series`	Single line text		Name of a series that contains multiple publications or presentations.
`place`	Single line text		The place of publication for an item.

Adding Fields to the Table

Now that we have a model for our data, you can return to the Sound Studies Books base, open the books table, and add these fields. This can be done by clicking the + icon to the right of the last field and setting field preferences in the popup.

Your Airtable base should now have one table, one grid view, and eight fields:

Data Modeling and Structuring Fields

Structuring a Humanities Dataset

Data Modeling

Book Database: Creating Fields

Airtable Field Types

Assigning Types

Adding Fields to the Table

Further Reading

Humanities Data

Data Modeling

Tidy Data

Airtable Bases