In many ways, dealing with data is akin to tucking in your toddler: Simple in theory, yet fraught with challenge. As genomic research success hinges on large data sets, managing that information properly is vital to enhancing disease resilience in the pork sector.
“This project deals with a lot of pigs, so I created a web-accessible database for searching, viewing and sharing the huge data sets we generate,” said Jason Grant, research associate – bioinformatics in the Faculty of Agricultural, Life and Environmental Sciences at the University of Alberta. “We currently have information on over 3,000 pigs provided by a number of users, including academics and industry partners. With this system, we can quickly import, validate and cross reference data from multiple sources.”
Given that most people store their data on Excel spreadsheets, why not use that same method to pass the information along?
A valid point
“The database lets us validate the material when we import it. We can confirm that if a value is required in a certain field, it’s there; if a number should be between 1 and 100, it is. Ultimately, our focus is ensuring that the data is accurate and makes sense.”
The system allows for multiple users to view the same data at the same time. Just as importantly, it can display subsets of data so that certain people can only access certain information. For example, PigGen Canada members who have supplied pigs for the project may see details on only their specific animals.
From a security standpoint, everything in the system is protected by user accounts. To safeguard data integrity, the design is “read only”. Participants must send their data to Dr. Grant for verification and entry, so that the values aren’t skewed by user input errors.
Discussion around policies and process for data storage are important, but they beg a critical question: What sort of data are we talking about?
Data diversity
“We store all sorts of information such as animal characteristics, sire and dam identity, performance data, carcass attributes, animal weights, ultrasound measurements and health details. We also keep all data concerning daily feed and water intake for each animal. Down the road, we will have more phenotypic data for the animals on how they respond to various treatments and exposures.”
As important as it is to have the right quality and quantity of data, the need for storing it properly can’t be overstated.
“It’s critical to have one place you can go where you know the data is up to date. If you’re just passing around a spreadsheet and someone finds an error, they must correct it and then ensure that everyone with that spreadsheet gets the revised version. With our system, I just make the change on the master data and all users can then access the correct version.”
Also, the system’s design should make it easy to add new data tables as the project progresses.
Of course, just as genomic research can be labor intensive; organizing the results is no small feat.
“Assembling this database took a lot of work,” said Grant. “We needed to have the proper scripts so we could import data from different spreadsheets. When we find mistakes in the data, we must follow up with the party who provided it for clarification or additional data. It’s all about communication, and having everything go through one person is critical.”
For an industry that often curses the latest pork prices, anything that makes for healthier pigs and profits would be a blessing indeed.