Chapter 1 - Setting the Pace: What is Bad Data?

The way the book is organized:

Guidance for Grubby, Hands-on Work
- You can’t assume that a new dataset is clean and ready for analysis. Ch. 2 offers several techniques to take the data for a test drive.
- Ch. 3 (Data Intended for Human Consumption, Not Machine Consumption): Some ways to help you extract data (from spreadsheets) into something more usable.
- Ch. 4 is about character encoding problems and how to handle them.
- Ch. 5 walks you through everything that can go wrong in a web-scraping effort
Data That Does the Unexpected
- Using Natural Language Processing (NLP) to detect liars and the confused.
- Ch. 9: “When Data and Reality Don’t Match”
Approach
- Advice to data scientists from a software developer’s perspective (ch. 8). Note from Sergio: why would you name it Blood, Sweat, and Urine ?????
- Ch. 7: Is there such thing as truly bad data?
- Ch. 10: How you collect your data determines what will hurt you (bias and error).
- Ch. 11: How dirty data will give your classical statistics training a harsh reality check.
Data Storage and Infrastructure
- Ch. 13: How you store your data weighs heavily in how you can analyze it. Spotting graph data in a relational database.
- Ch. 14: Dissecting assumptions on cloud computing’s scalability and flexibility. Note from Sergio: this book is from 2013 so i expect this to be outdated ??.
- Ch. 12: When to stick to files instead of databases.
The Business of Data
- Ch. 16: How to out-source machine-learning.
- Ch. 15: Several worst practices to avoid when it comes to corporate bureaucracy policy.
Data Policy
- Ch. 17: Sure you know what methods you used, but do you truly understand how those final figures came to be? Food for thought for your data processing pipelines.
- Ch. 18: Looks to the future of social media, and thinks through a much-needed recall feature. Note from Sergio: Again, this is from 2013….
- Ch. 19: How to assess your data’s quality, and how to build a structure around a data quality effort.

Note from Sergio: this book is v white.

Previous chapter	Next chapter