Author: David Spiegelhalter
Publication date: 1 February 2020
Publisher: Penguin Books
Number of pages: 448 pages
Review
This book is a perfect book to understand the basic of statistics, why it matters, mistakes that commonly happen in statistics, and the fatal impact if we belittling the method that we pick for data analysis. David Spiegelhalter explains it with popular and relatable topics that we could find every day.
As someone who has a bad experience in learning statistics , I can say this book is perfect for introduction in statistics class. Not only learning about the theory, this book also highlights the cricital point of the data problem solving cycle: Problem, Plan, Data, Analysis, and Conclusions (PPDAC). The most essential part of this book is it point up the danger of systematic biases interpreting results of data analysis from the most common yet critical topics in our daily life.
After reading this book, I feel like I should have read this book instead of attending statistics class a decade ago. I can understand so much about statistics by practical approach with interesting topics in each section. How David explains every statistics method is genius.
This book is not a type of book that offers mindfulness to help the reader feel relax or destress yet I find myself enjoying this book so much that I barely put it down and get it done within 2 days. I recommend this book not only to anyone who want to learn about statistics or data scientists, but also to all of us in the current age of big data and sometime wrong interpretation of those in media.
Highlights
- If we are to use statistical science to illuminate the world, then our daily experiences have to be turned into data, and this means categorizing and labelling events, recording measurement, analyzing the results and communicating the conclusions.
- Statistics are always to some extent constructed on the basis of judgement, and it would be an obvious delusions to think the full complexity of personal experience can be unambigously coded and put into a spreadsheet or other software.
- Data has two main limitations as a source of such knowledge:
- It is almost always an imperfect measure of what we are really interested in.
- Anything we choose to measure will differ from place to place, from person to person, from time to time, and the problem is to extract meaningful insights from all this apparently random variability.
- We want our data to be:
- Reliable, in the sense of having low variability from occasion to occasion, and so being a precise or repeatable number.
- Valid, in the sense of measuring what you really want to measure, and not having a systematic bias.
- Once we want to start generalizing from the data — learning something about the world outside our immediate observations — we need to ask ourselves the question, ’Learn about what?’ And this requires us to confront the challenging idea of inductive inference.
- Causation, in the statistical sense, means that when we intervene , the chances of different outcomes are systematically changed.
- Researchers spend their lives scrutininzing the type of computer output…., hoping to see the twinkling stars indicating a significant result which they can then feature in their next scientific paper. But, as we now see, this sort of obsessive searching for statistical significance can easily lead to delusions of discovery.