Cleaning and Organizing Environmental Data
question:data-cleaning

_Lead image: Clean Data by Gene Stroman from the [Noun Project](https://thenounproject.com), CC BY_ After you’ve collected environmental data from a sensor, monitor, or other piece of equipment, one of the next steps is to organize and “clean” it! Cleaning includes making sure the dataset is complete and consistent. Organizing the data into a table in a meaningful way gets it ready for making charts, graphs, and other visualizations. Below are some resources on cleaning data, including making tables of tidy data. ### Making tables of tidy data _Images: Illustrations from the [Openscapes](https://www.openscapes.org/) blog “[Tidy Data for reproducibility, efficiency, and collaboration](https://www.openscapes.org/blog/2020/10/12/tidy-data/)” by Julia Lowndes and Allison Horst, [CC BY](http://creativecommons.org/licenses/by/4.0/)_ An example of “tidy data” from an air quality sensor might look like this: _**Each variable forms a column**_: sensor ID number, date, time, and the air quality measurement of particulate matter are individual variables. Each variable gets its own column in the table. The column header at the top lists the variable name and its units of measurement. _**Each observation forms a row**_: this sensor took an air quality measurement every minute. Each measurement gets its own row in the table. _**Each cell is a single measurement**_: each block in the table shows one piece of data---one time, one PM measurement, etc. ### Cleaning data _More to come here!_ ### Questions on organizing and cleaning data Questions tagged with `question:data-cleaning` will appear here [questions:data-cleaning] ### Activities Activity posts tagged with `activity:data-cleaning` will appear here [activities:data-cleaning] ### More resources on organizing and cleaning data + [Formatting Data Tables in Spreadsheets](https://datacarpentry.org/spreadsheets-socialsci/01-format-data/) and [OpenRefine for Data Cleaning](https://datacarpentry.org/openrefine-socialsci/): guidance and exercises from a [workshop session](https://marwahaha.github.io/2019-05-30-nas/) by Data Carpentry. + “[Clean Up Messy Data](https://handsondataviz.org/clean.html),” chapter 4 from the open-access web edition of _Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code_, by Jack Dougherty and Ilya Ilyankou. + Wickham, H. 2014. **Tidy Data**. Journal of Statistical Software, 59(10), 1–23. [LINK to paper](https://doi.org/10.18637/jss.v059.i10)....


Author Comment Last activity Moderation
liz "oh no! I wonder if there's a way to make it more obvious that once someone has clicked "preview" to preview their comment, that the "preview" butto..." | Read more » over 4 years ago
jeffalk "Hi @liz, I just lost my lengthy comment. I clicked on the "preview" button below and then wanted to change some things and couldn't get back and..." | Read more » over 4 years ago
joyofsoy "Oh man. Where to start? Even before you correct based on a model, you would want some normalization by timestamp at maybe 5 minute or hour interval..." | Read more » over 4 years ago
liz "hi @IshaGupta18 i'd love to see your guide to Simple Data grapher when it's ready :) " | Read more » over 4 years ago
liz "Hi @jeffalk , thanks for offering help! Which are your favorite functions to use within Excel for data cleaning? Can you describe your process here? " | Read more » over 4 years ago
jeffalk "Hi @stevie, @Aleah, @Cbarnes9, and @crispinpierce, stevie's list is excellent, but perhaps more detailed than is necessary. Please correct me if I..." | Read more » over 4 years ago
anurag123bhai "thanks for sharing good things ....https://sarkariresultadda.com " | Read more » over 4 years ago
warren "@IshaGupta18 maybe your guide to Simple Data Grapher could address some of this? :+1: " | Read more » over 4 years ago
stevie "@Aleah I recently took a Data Carpentry course on this topic which offered some neat tools and practices. For best practices, here are some of the..." | Read more » over 4 years ago