I am a statistical consultant at the University of Melbourne, working with researchers at the University and clients outside the University. This can involve all stages of quantitative research: designing experiments or surveys, planning an appropriate analysis, analysing data, making graphs and communicating results clearly in papers or reports.
My blog here is a collection of my thoughts as a practicing statistician. Some of my posts are intended as resources for clients. Other posts will be more of interest to fellow statisticians.
When I'm not thinking about statistics, I try to spend time out in nature—hiking, trail running, cycling, or camping.
Master of Science in Mathematics and Statistics, 2016
University of Melbourne
Bachelor of Science in Pure Mathematics, 2009
University of Western Australia
Bar charts (or bar graphs) are commonly used, but they’re also a simple type of graph where the defaults in ggplot leave a lot to be desired. This is a step-by-step description of how I’d go about improving them, describing the thought processess along the way. Every plot is different and the decisions you make need to reflect the message you’re trying to convey, so don’t treat this post as a recipe, treat it as some points to consider—and hopefully, a few tips that will help you achieve the look you want in your own plots.
If you have a continuous measurement and two groups you’d like to compare based on that measurement, what’s the first statistical test that comes to mind? Chances are it’s the two-sample t-test, sometimes known as Student’s t-test. It’s typically the first statistical test taught in an introductory statistics course, it’s well known and understood, and it has good theoretical properties—so if a t-test answers your research question, you should probably use it.
Data that has some kind of hierarchical structure to it is very common in many fields, but is rarely discussed in introductory statistics courses. Terms used to describe this kind of data include hierarchical data, multi-level data, longitudinal data, split-plot designs or repeated measures designs. Statistical models used for these types of data include mixed-effects models (often abbreviated to just mixed models), repeated measures ANOVA and generalised estimating equations (GEEs).
In exploratory data analysis, it's common to want to make similar plots of a number of variables at once. Here is a way to achieve this using R and ggplot2
.