Bar charts (or bar graphs) are commonly used, but they’re also a simple type of graph where the defaults in ggplot leave a lot to be desired. This is a step-by-step description of how I’d go about improving them, describing the thought processess along the way. Every plot is different and the decisions you make need to reflect the message you’re trying to convey, so don’t treat this post as a recipe, treat it as some points to consider—and hopefully, a few tips that will help you achieve the look you want in your own plots.
If you have a continuous measurement and two groups you’d like to compare based on that measurement, what’s the first statistical test that comes to mind? Chances are it’s the two-sample t-test, sometimes known as Student’s t-test. It’s typically the first statistical test taught in an introductory statistics course, it’s well known and understood, and it has good theoretical properties—so if a t-test answers your research question, you should probably use it.
Data that has some kind of hierarchical structure to it is very common in many fields, but is rarely discussed in introductory statistics courses. Terms used to describe this kind of data include hierarchical data, multi-level data, longitudinal data, split-plot designs or repeated measures designs. Statistical models used for these types of data include mixed-effects models (often abbreviated to just mixed models), repeated measures ANOVA and generalised estimating equations (GEEs).
In exploratory data analysis, it's common to want to make similar plots of a number of variables at once. Here is a way to achieve this using R and