We scientists live immersed in data. It’s what most of us do – gather it, interpret it, and report on what we find. Where does the data come from and what does it tell us?
Think about all the data we can get from a single mineral grain. A rock might sit around for billions of years – or millions – or tens of thousands. It weathers and falls apart and the grains get washed down the hill, into a stream perhaps and bounces along the bottom or gets caught up in a swollen river. Eventually the rains stop or the stream runs into a still pool of water and more sediments cover this little grain, burying it beneath thick piles of sediment. It becomes a rock over time, and in another billions or millions or tens of thousands of years later, a geologist comes along, picks it up and describes it. She writes down her observations of where she got it, what does the outcrop look like, what day it was – hence, starting the data collection. Where was it sitting, what does it look like, how does it relate to the other rocks in the area?
She takes it back to the laboratory, crushes it, perhaps measures its chemistry, and take more data. She finds some interesting minerals and plucks them from the crushed rock, puts them on a slide and into a machine that measure its chemistry or perhaps its isotopes, telling her how old that grain is. These data are the ties that link the person today to what happened in the past, possibly billions of years ago.
If she is clever and perseverant, she writes a paper about that grain (and perhaps its friends) and it gets published in a journal. Some librarian takes the data from the journal and puts it into a catalogue. If her organization has a museum, she might deposit the grain in its collection and the museum curator would put more data about the grain into their data base. Who uses it, where the grain is stored….on and on it goes.
How else do we look at data? We have student data – their courses, grades, where they live. We have data on the people who live now and move across the planet. We have data on the diversity of plants and animals – although this group of data are very incomplete. How little data we have on the diversity of life on this planet– even as we as humans are decreasing that diversity.
I have spent my life dealing with data: geochemical, mineralogical, geodetic, test scores, student grades, demographics, and more. There are also all the things we do to data and for data: write, label, store, digitize, spend money, categorize, visualize, archive…on and on it goes.
And how do we access this data? I learned about data products when I worked in an organization whose main function was to acquire and store geodetic data. First order data was that which came straight off the machine – the instrument designed to collect it. It was one person’s job – his life work to turn that into second order data – numbers that could be then used by techies and scientists. Third was something more easily determined by a variety of people who might want to use it – in this case, time series of earths movement. And in the education business, we wanted to turn the data into fourth order data – visual or easily accessible by a wide range of audiences including you and me.
Some data is inaccessible to us from the very beginning – buried in the Earth, eroded, weathered, removed through anthropogenic means…and then when we interpret it as humans, the data can again be buried in archives, degraded, destroyed, or be inaccessible through lack of resources. Inequality exists even in our digital age.
Who has the access to the education needed to interpret all this data? Figuring out what the information of one little grain is hard enough. How do we figure out what terabytes of data tell us?