_Lead image: Clean Data by Gene Stroman from the [Noun Project](https://thenounproject.com), CC BY_ After you’ve collected environmental data from a sensor, monitor, or other piece of equipment, one of the next steps is to organize and “clean” it! Cleaning includes making sure the dataset is complete and consistent. Organizing the data into a table in a meaningful way gets it ready for making charts, graphs, and other visualizations. Below are some resources on cleaning data, including making tables of tidy data. ### Making tables of tidy data _Images: Illustrations from the [Openscapes](https://www.openscapes.org/) blog “[Tidy Data for reproducibility, efficiency, and collaboration](https://www.openscapes.org/blog/2020/10/12/tidy-data/)” by Julia Lowndes and Allison Horst, [CC BY](http://creativecommons.org/licenses/by/4.0/)_ An example of “tidy data” from an air quality sensor might look like this: _**Each variable forms a column**_: sensor ID number, date, time, and the air quality measurement of particulate matter are individual variables. Each variable gets its own column in the table. The column header at the top lists the variable name and its units of measurement. _**Each observation forms a row**_: this sensor took an air quality measurement every minute. Each measurement gets its own row in the table. _**Each cell is a single measurement**_: each block in the table shows one piece of data---one time, one PM measurement, etc. ### Cleaning data _More to come here!_ ### Questions on organizing and cleaning data Questions tagged with `question:data-cleaning` will appear here [questions:data-cleaning] ### Activities Activity posts tagged with `activity:data-cleaning` will appear here [activities:data-cleaning] ### More resources on organizing and cleaning data + [Formatting Data Tables in Spreadsheets](https://datacarpentry.org/spreadsheets-socialsci/01-format-data/) and [OpenRefine for Data Cleaning](https://datacarpentry.org/openrefine-socialsci/): guidance and exercises from a [workshop session](https://marwahaha.github.io/2019-05-30-nas/) by Data Carpentry. + “[Clean Up Messy Data](https://handsondataviz.org/clean.html),” chapter 4 from the open-access web edition of _Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code_, by Jack Dougherty and Ilya Ilyankou. + Wickham, H. 2014. **Tidy Data**. Journal of Statistical Software, 59(10), 1–23. [LINK to paper](https://doi.org/10.18637/jss.v059.i10)....
Author | Comment | Last activity | Moderation | ||
---|---|---|---|---|---|
keshavgarg234156 | "Whenever we have such large numbers of columns then we first perform correlation analysis before plotting the graphs. In correlation analysis, we c..." | Read more » | almost 5 years ago | |||
guolivar | "Normalising is your friend when you have things with very different ranges. Depending on what you're looking for you can divide each column by its ..." | Read more » | about 5 years ago | |||
eustatic | "Pivot tables? Lookup or index command? Conditional formatting? all of these things can help. i had to have 10 years and 14 plant plot growth s..." | Read more » | about 5 years ago | |||
warren | "I also noticed that some very high values make others basically unreadable. So having 2 y-axis scales can sometimes help, i guess, although that's ..." | Read more » | about 5 years ago | |||
warren | "Thanks! Yes, i started by removing things like firmware #, for sure. It's interesting to see how some values track each other roughly, while others..." | Read more » | about 5 years ago | |||
guolivar | "This is very much dataset, project and software dependent. Sometimes I remove columns, other times I combine them (averaging, adding, etc), other ..." | Read more » | about 5 years ago | |||
warren | "Hi @stevie, @Aleah, @Cbarnes9, and @crispinpierce, I posted this in follow up after the purple air datalogs ended up so dense and tough to read! Th..." | Read more » | about 5 years ago | |||
warren | "OK, i got the units to show. I trimmed back one file to just a few of the columns, just to make it more readable, but it doesn't have all the diffe..." | Read more » | over 5 years ago | |||
warren | "If you'd like, I can change it to just links for download! But it was interesting to see the data a bit in a graph. There are just so many fields, ..." | Read more » | over 5 years ago | |||
warren | "Hi, all, drag-and-drop for big CSVs is working again now. The graph that's generated is a bit wild because there are so many fields. But there is a..." | Read more » | over 5 years ago | |||
stevie | "Hi Jeff I have them, I also tried to get them up. I'll send them to you. " | Read more » | over 5 years ago | |||
warren | "Crispin, could you email me the CSVs you're working with at jeff@publiclab.org so we can debug a bit? And, you had tried uploading them here in the..." | Read more » | over 5 years ago | |||
warren | "Hi @crispinpierce - are the 50 csv files each a 24-hour datalog? I'm going to test out uploading a CSV to help debug here, too, so hang on a sec: ..." | Read more » | over 5 years ago | |||
crispinpierce | "@jefffalk, @stevie, @Aleah, @Cbarnes9. A script that would remove the first row (column headers) and combine 50 csv files into a single Excel workb..." | Read more » | over 5 years ago | |||
tranglixi88 | " Cách vào tiền dàn đề online Trong đánh lô đề, soi cầu chuẩn mỗi ngày quyết định việc thắ..." | Read more » | over 5 years ago | |||
crispinpierce | " Hi Stevie, Thanks for the quick follow-up. I had two windows open: the PublicLab conversation and the folder with the files. I dragged the files..." | Read more » | over 5 years ago | |||
warren | "There is also a "recover comment" button shown with a disk icon in the editor; it may still have your text? On Tue, Aug 27, 2019 at 10:00 AM \<..." | Read more » | over 5 years ago | |||
liz | "We might also find a time to jump on a video call, and do some screensharing to get on the same page (literally) and make some progress! Every Tues..." | Read more » | over 5 years ago | |||
jeffalk | "Hi @stevie, @Aleah, @Cbarnes9, and @crispinpierce, I may be jumping the gun here but since .csv files are openable by EXCEL and can be copied and ..." | Read more » | over 5 years ago | |||
stevie | "Hi Crispin - about getting these files up, sorry you're having trouble. Curious, are you dragging them into your email in reply or directly onto th..." | Read more » | over 5 years ago | |||
crispinpierce | "@stevie, @warren, @jeffalk, @Aleah and @Cbarnes9. Our issue with the data files is to be able to amalgamate numerous 24-hour csv files that are wri..." | Read more » | over 5 years ago | |||
crispinpierce | "@stevie, @warren, @jeffalk, @Aleah and @Cbarnes9. Our issue with the data files is to be able to amalgamate numerous 24-hour csv files that are wri..." | Read more » | over 5 years ago | |||
liz | "Thank you very much @jeffalk ! Great narrative. " | Read more » | over 5 years ago | |||
jeffalk | "Hello @liz; That might help if the someone, like me, had sense enough to look at the buttons down at the bottom (as I did not!). Here I'll try a..." | Read more » | over 5 years ago |