July 18, 2024


The Internet Generation

How to count by group in R

Counting by various groups — at times named crosstab stories — can be a beneficial way to look at information ranging from community viewpoint surveys to healthcare assessments. For illustration, how did men and women vote by gender and age team? How quite a few software program developers who use equally R and Python are males vs. females?

There are a lot of ways to do this variety of counting by classes in R. Here, I’d like to share some of my favorites.

For the demos in this write-up, I’ll use a subset of the Stack Overflow Developers survey, which surveys developers on dozens of matters ranging from salaries to systems made use of. I’ll whittle it down with columns for languages made use of, gender, and if they code as a interest. I also additional my very own LanguageGroup column for irrespective of whether a developer described employing R, Python, equally, or neither.

If you’d like to stick to along, the final site of this write-up has directions on how to obtain and wrangle the information to get the similar information established I’m employing.

The information has one particular row for just about every survey reaction, and the 4 columns are all figures.

'data.frame':83379 obs. of  4 variables:
 $ Gender            : chr  "Gentleman" "Gentleman" "Gentleman" "Gentleman" ...
 $ LanguageWorkedWith: chr  "HTML/CSSJavaJavaScriptPython" "C++HTML/CSSPython" "HTML/CSS" "CC++C#PythonSQL" ...
 $ Hobbyist          : chr  "Of course" "No" "Of course" "No" ...
 $ LanguageGroup     : chr  "Python" "Python" "Neither" "Python" ...

I filtered the raw information to make the crosstabs much more manageable, including eliminating lacking values and taking the two most significant genders only, Gentleman and Girl.