Sharing the love by sharing the data

By UVA Library |

It’s Love Data Week! This week we’re featuring guest contributors from the Library’s Research Data Services team. Today’s post comes from Joe Edgerton, Research Data Management Librarian. 

International Love Data Week: February 9-13, 2026Most people don’t look at a dataset and think about all the work put into making it usable — it’s just available, usually on a website or repository. However, librarians and data stewards who curate data think about this a lot. Whether providing a few best practices to researchers or checking if datasets are “Findable, Accessible, Interoperable, and Reusable” (“FAIR”), it all boils down to making research data easier to use. 

For this year’s Love Data Week celebration, I encourage data creators to think about the potential reach of their data and how to make datasets more shareable. I will also cover some typical checks that data curators perform when reviewing dataset submissions.

The drive for sharing data

The 2022 Public Access Plan issued by the Office of Science Technology and Policy (OSTP), known as the “Nelson Memo,” promotes researchers with federal grants to share their work in an open, accessible way — which includes depositing research data into data repositories.

However, the simple act of sharing your data is no guarantee that the data can be reused (Borgman and Groth, 2025). Data librarians, stewards, and researchers continue to see concerns about research data and code being incomplete and not reusable. We also see data management efforts from specific communities in attempts to address these issues (e.g., SORTEE Guidelines for Ecology and Evolutionary Biology). In extreme cases of research projects where there is little-to-no planning to share or manage data, lots of remediation effort is needed to triage and save the data (Llebot, 2018).

That’s all to say: the work involved in getting datasets ready to share is worthy of attention.

Cleaning and organizing the data before guests arrive

To some researchers, data cleaning may just be the step before data analysis. While this is true, data cleaning also has some data management benefits that are not always obvious. Cleaning your data helps you find errors and reduce mistakes, but it also helps make your data more transparent and more usable by others. So, what exactly can data curators or creators do to increase potential reusability?

Data curators typically have a few suggestions to improve the shareability or reusability of datasets being deposited in a repository. For instance, if you have quantitative data in tabular formats, like .xlsx, a curator will check files for tidy data principles that make data more machine-actionable. Untidy data can have organization issues from multiple tables in a single spreadsheet to a cell that contains more than one value. Multiple tables in one spreadsheet make it harder to discern observations and variables, and can confuse analytical programs when data are being parsed. Multiple values in cells make it hard to count occurrences of a value and to perform summaries over all values in a column.

The three principles of tidy data: Each column represents one variable, each row represents one observation, each cell represents one value. Images from Camila Vargas Poulsen, Casey Ohara, Shayna Sura (2024), NCEAS coreR for Delta Science Program, October 2024, NCEAS Learning Hub. https://learning.nceas.ucsb.edu/2024-10-coreR.

 

Along with the format of the data, curators also check for appropriate values, missing values (e.g., “NA,” blank), and proper headers.

For qualitative data, the checking process can involve reviewing transcripts for errors or identifying sensitive information that needs to be removed. It could also involve changing proprietary file formats into formats that are better suited for sharing (e.g., a text format like .txt).

Are formatted and cleaned data enough for reuse?

Even if your spreadsheets are “tidy” or your images are well-annotated, it’s still not enough to easily reuse. A good metaphor would be gathering all the right ingredients but using a recipe that isn’t clear or easy to follow. 

Having enough metadata or documentation is critical to making your data effective both for yourself and others. I can’t count the number of times I made a folder, placed some files in it, and then forgotten what purpose each file serves or why they were needed. Describing your files, methods, and values are all data management practices that can be addressed with a little planning and dedicated time.

For data cleaning and documentation tips, see our “Preparing Data for Sharing” checklist.

Different data, same mission

All disciplines have their own data types, formats, and norms. The steps taken to collect and clean a dataset of microscopy images is completely different than the steps taken for a dataset of historical full-text English novels. What is the same between research datasets is the need to share and preserve them so others can view and potentially reuse them in the future.

Whether you are someone who reviews datasets or someone who creates them, there is a universal goal for anyone involved in sharing datasets: close the “distance” between data creator and data user (Borgman and Groth, 2025). Try to always think about your data audience and what information they will need to reuse the data you’re sharing.

Ask the UVA Library about depositing data

If you have a dataset and want to make sure it can support reuse, feel free to email the UVA Research Data Management team at dmconsult@virginia.edu.  You can also schedule a consultation to ask us any questions you have. Finally, check out our Research Data Management guide for tips and resources.

Loving Love Data Week? Read this week’s earlier post from Laura Hjerpe: “Where’s the data??”

 

References: