Chapter 3 Organizing Information and Data for Interdisciplinary Projects

3.1 Introduction

Efficient information storage and organization is integral to interdisciplinary and highly collaborative research (IHCR). Even though working with IHCR data is in many ways similar to working with data that comes from disciplinary projects, there are some important differences:

  • Many IHCR projects are grant-funded, so the need for efficient long-term management of data is more pressing considering that the funding may not be renewed.
  • IHCR projects are characterized by data and methodological heterogeneity, i.e., the data collection and analysis techniques come from various domains that have varying rules and procedures.
  • IHCR projects often use online platforms in addition to traditional scholarship venues (papers, conferences) to disseminate their work.
  • IHCR projects are often “cross-institutional,” which makes storage and organization even more complex.

Due to their shorter and more heterogeneous nature, IHCR projects require more coordination. A consistent approach to information storage and organization can help reduce the burden of coordination and allow the project to track and update resources as needed.

The workflow below provides a set of recommended steps for organizing data and information.

3.2 Workflow Diagram

3.3 Detailed Description of Steps

3.3.1 Set goals

The goals of data and information organization (which in most practical cases is synonymous with file organization) need to be aligned with the goals of the project. For research projects, the goals usually include collecting and analyzing data and delivering results in the form of papers and presentations. For IHCR projects, the goals also need to address their heterogeneous nature. In addition to the project goals, goal-setting for file organization includes understanding who will be using the file system.

3.3.2 Determine storage needs

Determining storage needs includes determining the amount of storage that will be needed for the project, modes and rules of access, existing storage options, and the alignment between storage and computation. To determine project storage needs, consider the following:

  • With whom the data will be shared regularly (e.g., collaborators at your institution, external collaborators)?
  • What is the classification of your data (e.g., public, sensitive, confidential/restricted, HIPAA-compliant)? Many universities have classifications of their institutional data, which can be used as a guideline.
  • What are your data safety procedures? (e.g., number of backup copies, backup locations, frequency of snapshots).
  • How much data do you have and how fast will it grow? (e.g., under 1TB/year, over 1TB/year).
  • Do you have special performance needs? (e.g., thousands of files, frequent transfer, large computational needs).
  • How are you expecting to access the data? (e.g., from own computer, remotely from any computer).
  • Who is responsible for the data storage and its maintenance? (e.g., yourself, another team member, your institution).
  • What is your budget for data storage and management and where does it come from? (e.g., no budget, fixed amount from grant, ongoing support from institution).

3.3.3 Evaluate options and select the appropriate storage

To evaluate storage options, the team may want to consult with institutional IT, the library, or a dedicated data management group at their respective institutions.

3.3.4 Organize files into folders and sub-folders

Folder names and structure are highly dependent on the nature of the project. No files should be stored outside of folders, and each file is related to a category within an area. Start with a minimal hierarchy and add complexity only when certain that it will simplify daily work.

Below is a suggested structure that can be adapted to many research projects:

  • Administration
    • Hiring
    • Finances
    • ..
  • Grant materials
    • Proposal
    • Reports
    • ..
  • Instruments
    • Questionnaires
    • ..
  • Data
    • Collected
    • Processed
    • Analysis
    • Publications and presentations
    • Archive

3.3.5 Establish a file naming convention

Define your file naming convention based on aspects that are important to the project. Use descriptive names that include a) a date, b) experiment conditions, c) location, d) researcher name/initials, e) version number.

Decide whether to use underscore (_), hyphen (-), or camelCase consistently in file names between the aspects chosen.

Include dates in the YYYYMMDD or YYYY-MM-DD format at the beginning of the name to keep files in chronological order date_content

3.3.6 Consider using tags and controlled vocabularies

File names may be more useful if standardized information is added to them. For example, the filenames may include designations for sharing status, readiness, and so on.

  • Data: [raw] [processed] [reshaped] [cleaned]
  • Sharing: [noshare] [shared] [public]
  • Readiness: [first draft] [revision] [final]
  • Funding: [submitted] [rejected] [awarded]

3.3.7 Establish a system for version control

Most commonly used method includes a combination of date, sequential numbering and researcher initials for version tracking. This method is prone to errors and requires a lot of effort in tagging / renaming each file.

A better way for version control is to use the established version control systems such as Git, CVS, or SVN, however, they require investments in learning, training, and adoption.

3.3.8 Define access permissions

Determine who on the team should be able to access and/or edit which files. Implement access controls as needed to ensure file integrity and/or comply with requirements for data that has been classified as sensitive, confidential, HIPAA-protected, etc.

3.3.9 Solicit input from team members

Once the system has been in place for a reasonable amount of time, reach out to the team members and solicit their input on the following:

  • What works well in the current system?
  • What is not working with the current system?
  • What is missing from the current system?
  • Discuss and review the feedback and determine what changes can improve the system without adding too much burden to maintenance and implementation.

3.3.10 Maintain the system

Schedule recurring maintenance time to review filing practices and move misplaced files. Review the existing files and folders and delete or archive what is not needed. Designate someone on your team as the go-to person for all file-organizing questions. Check in with your team periodically to see whether your organization system is working and adjust as necessary.

3.4 Resources and Examples

Illinois Library File Naming Conventions

How to Design File Management for a Company

Gentzkow, M. & Shapiro, J. M. (2014). Code and Data for the Social Sciences: A Practitioner’s Guide. pdf

Data Storage Finder at Cornell University

Store, share, and collaborate on institutional files at Indiana University

File Transfer, Storage, and Infrastructure at University of Colorado Boulder

File naming apps: Bulk Rename Utility (Windows), Renamer (Mac), PSRenamer