For a class statistics project, I choose to work with data from the Hugo Awards. Looking at the nominations data, I was curious to know the gender breakdown over the past ten or so years. This proved to be impossible because you cannot infer someone’s gender from their name; so, I did the next best thing and analyzed the breakdown of a nominated author’s pronouns. I ended up with eleven years of data because the 2020 nominations were released during my collection of the 2009-2019 data. As such, I figured more data is better and added the 2020 nominations to my data set.
What I found probably won’t surprise you, but how I got to my conclusions might. Skip to the results section if you want to get to the answers right away.
Methodology
Right, so how did I go about doing this? Well…
- My data source was the Hugo Awards website
- I pulled every nominated author from each relevant category into a spreadsheet
- To ascertain an author’s pronouns, I looked through an author’s twitter, Wikipedia page, or website
- Nominations of a work with multiple authors were excluded because it would violate my assumption of one set of pronouns per nomination. As such, my data is missing a few data points
- I only looked at the following categories because finding an author’s pronouns is labor intensive so I had to limit the scope of the project
- Best Novel
- Best Novella
- Best Novelette
- Each category yielded ~63 observational units.
- I used Python to do some minimal data processing and R for data visualization
- The GitHub repo with my dataset, code, and images is linked at the top of the post. It can also be found here!
Disclaimers
- The data entry was done by hand; as such, it is possible that there is an error in the data set
- I pruned nominations with multiple authors from my data set because it would violate my assumption of one set of pronouns per nomination
- I want to make it very clear that I am glad that the Hugo Awards have recently been more conscious about gender equity
- The data that I collected is wildly skewed. The number of people nominated that use she/her pronouns shown in this sample is by no means representative of the 60+ year Hugo dataset as a whole
- I also didn’t collect data on the short stories category because I had a finite number of human-hours to work on this project
- I didn’t check for pen names. Big thanks to GitHub user islemaster for pointing this out!
Results
![]() |
![]() |
---|---|
![]() |
![]() |
Some Stats
- Best Novel Nominations (63 nominations)
- There are 24 unique authors that use she/her pronouns
- There are 23 unique authors that use he/him pronouns
- There are 0 unique authors that use they/them pronouns
- Best Novella Nominations (63 nominations)
- There are 21 unique authors that use she/her pronouns
- There are 23 unique authors that use he/him pronouns
- There are 2 unique authors that use they/them pronouns
- Best Novelette Nominations (65)
- There are 28 unique authors that use she/her pronouns
- There are 26 unique authors that use he/him pronouns
- There is 1 unique author that uses they/them pronouns
- The three authors with the most nominations are
- Seanan McGuire (7)
Mira Grant (6)8/29/20 UPDATE: Turns out Mira Grant is a pen name for Seanan McGuire. Shout out to islemaster for catching this!- Aliette de Bodard (5)
Analysis
- On the surface, there looks to be a large improvement in representation for authors that use she/her pronouns
- Both Best Novel and Best Novelette have a strong positive correlation for authors who use she/her pronouns
- Best Novel takes this correlation to its natural conclusion in 2020 where all of the authors use she/her pronouns
- Best Novella has a weak positive correlation for authors who use she/her pronouns
- In both the Best Novella and Best Novelette categories, there is a strange blip in the 2015 data. This is a result of a malicious scheme by a group of internet misogynists to stuff the ballot box because they felt that too many people that didn’t have he/him pronouns were getting nominations.
- Both Best Novel and Best Novelette have a strong positive correlation for authors who use she/her pronouns
- The breakdown of nominations per author is very informative. Authors who use she/her pronouns had multiple nominations at a rate that far outstripped authors of he/him and they/them pronouns
- This tells us that they keep drawing from the same pool of authors that use she/her pronouns
- In other words, authors that use she/her pronouns need to be exceptional to receive a nomination, while those with he/him pronouns don’t need to rise to the same standard
- Just 19 authors that use she/her pronouns are responsible for 67 nominated works. That’s about a third of the nominations over the last 11 years coming from just 19 people! So while overall representation has improved for authors who use she/her pronouns, the spread of authors is dismally low.
- Representation for people who don’t use she/her or he/him pronouns is abysmal at just 3 out of 118 nominated authors
Final Thoughts and Takeaways
On his blog post about my project, Cory Doctorow explained the final takeaway far better than I could have: “a small number of prodigiously talented women dominate the she/her category – meaning the median male author has a much higher chance at a nomination than the median woman author.”
This project was far more limited in scope than I wanted because there is no computer algorithm to determine a person’s gender. If there was, I could totally do an analysis on all published Hugo Awards data with a simple web scraper and this magical yet-to-be-designed algorithm. There is a really excellent R package that shows the deep flaws in trying to use math/stats/CS to infer someone’s gender. It’s called genderendr and I highly recommend checking it out.