Virtualitics Research: Women in the Olympics

Created By Amanda Levy, 3/12/22


I only used athlete_events.csv.

noc_regions.csv could be used to understand the relationship between NOCs and countries

Graphs (organized in folders according to the below research questions)

Research Questions

  1. Looking at overall participation color-coded by sex

    1. How has overall gender participation changed over the years between NOCs (just looking at the 15 NOCs mentioned below)?

    2. How has overall gender participation changed over the years between sports?

  2. Looking at impact of winter versus summer Olympics (season)

    1. How has overall gender participation changed over the years between winter and summer Olympics?

    2. How has overall gender participation changed over the years between winter and summer Olympics across NOCs?

  3. Looking at filtered to female participation

    1. How has female participation changed over the years between NOCs?

    2. How has female participation changed over the years between sports?

  4. Looking at each of the top 15 NOCs in terms of female entries

    1. How has female participation changed over the years for this NOC by each sport?

      1. USA

      2. FRA

      3. GBR

      4. ITA

      5. GER

      6. CAN

      7. JPN

      8. SWE

      9. AUS

      10. HUN

      11. POL

      12. SUI

      13. NED

      14. URS

      15. FIN

Suggested Full-Length Class Activity

  • Look at all six of the graphs in the first three categories

  • Look at assigned graph(s) in the fourth category

  • In the wiki, write one takeaway from each graph

Collaboration Aspect:

  • Everyone is gathering aspects on the same first six graphs

  • Each person/group gains insight and expertise on the NOC they examine for the fourth category, and it is the collaborative combination of all these takeaways that lets the group understand more deeply the two graphs in the third category

Advice and Preliminary Findings From Modified Class Activity


  • The Virtualitics documentation online is not good, especially with regards to what data type is needed in the dataset for each feature and plot type to work. For example, having data on countries in the world does not mean that the 3D map will work. From the videos I saw online and my failed trial and error attempts, it seems to be that longitude and latitude is needed for that.

  • Additionally, it can be challenging to manipulate the dataset once it is uploaded into Virtualitics, which can pose a challenge for satisfying the different data type criteria for the plot types. For example, a histogram can take a categorical variable and show the frequency by the height of the bars, but for a line plot to show the count of a filtered categorized variable, proved extremely difficult and seemed more time consuming. You needed a more advanced understanding of how to make additional columns of data in Virtualitics, how to change variables to numerical representations, applying sums to those, and inputting that number as a datapoint. Virtualitics does not have detailed documentation on how to complete this beyond identifying data manipulation as a feature.

  • Virtualitics smart mapping feature was extremely helpful for an initial and speedy orientation of the best visualization approach for the data and then it's much faster to customize from an already completed format. It’s like looking at an example and working backwards, which is helpful in a trial and error system where many of the features are not compatible with the dataset. You get to start with what works in Virtualitics and sub out what does not work with your goals, and you ultimately end up changing less features than if you started from scratch. It also helps brainstorm alternative perspectives to look at the data.

  • People initially look at the axis instead of the legend to identify the bars. This can often results in misidentification. Athletics was confused for arts competition. Additionally, people commented that it was hard to read the axis with so many categories, so having the bars color-coded was very helpful (which was the case when filtered to only look at female participation (Q3 and Q4)). Another helpful approach to address the difficult-to-read axis is to have someone sharing what the colors and axis are while a partner scales and rotates the graph in 3D. This is great because it makes the Virtualitics experience (even when limited by the number of licenses) even more collaborative.

  • If there's a setup for a study administrator and participant, then the most efficient approach is for the study to have the participant keep the headset on throughout the entire analysis session. The study administrator will switch between projects as they follow the folder sequence and enter/exit VR mode. This will result in the message "Please Take Off the VR Headset". The participant should ignore this message and not take off the headset.

Preliminary findings from the modified in-class activity:

  • Looking at Q1

    • It is a lot of data to represent succinctly in a graph. Participants noted that the most clear representation was the VR immersive bar graph. They noted that a line graph would be even more messy to track the progression through the dates with the high number of lines crossing over each other. They found that the color breakdown with men and women represented in one bar to show the total amount and the frequency as well as percentage breakdown was the most comprehensive and efficient approach. They also mentioned that this view was digestible because participants could zoom in and rotate the graph around. The participant using the VR immersive view did not find this graph overwhelming and eagerly jumped into the analysis by zooming and rotating the graph around, whereas others who viewed the graph from the desktop application found the number of data points to be very overwhelming and had to take a moment to process before analyzing. Additionally, the understanding based on the desktop application view is limited by how it is much slower and not effectively calibrated to spin and zoom in on the graph in the desktop instead of VR view.

              • Gymnastics and swimming are repeatedly the sports that grab people's attention

                • This makes sense that it is a repeated comment because the graphs go deeper and deeper into the same data to look at smaller subsets, so the overall takeaways should corroborate and build on each other.

  • Looking at Q2

              • there are a lot more women who do Summer Olympics than Winter Olympics

              • started to grow significantly around 1986

              • for "How has overall gender participation changed over the years between winter and summer Olympics across NOCs?", the color coding for the NOC breakdown was super helpful. Before looking closely at the data, because the count is by individual and not percentage of participating athletes for that country, it was expected that the US would be the clear leader in the number of women. It was expected that the US had the most amount of athletes and little barriers to women competing, so the sheer number of women competing would be relatively high. The color-coding identified that this hypothesis was incorrect. The US was represented by blue, which did not have a strikingly dominant presence in the breakdown. Additionally, the percentage of blue did not change over the years, which prompts concerns about the success of efforts to get girls involved in sports (Girls on the Run, etc.)

  • Looking at Q3

  • Athletics is a vague category, so I had to look at the original data to analyze what this category represents. Looking at the data, I checked what events were detailed under Athletics and recognized that they were the Track & Field events.

  • As mentioned above, one disadvantage of Virtualitics is that there is not way to edit that data once it is uploaded as a CSV into Virtualitics. Data manipulation, such as eliminating entries for not as frequently hosted sports or renaming Athletics as Track & Field, must be completed outside of the application and reloaded in as a new CSV.

  • Because you cannot change the axis, it is hard to keep people's attention on the sports you are trying to answer questions about. People get distracted by the presence of sports such as Arts Competition. Thus, the setup where the sports or NOCs are just filtered to show the ones that we want to analyze (the top 15) is helpful. This is another reason that filtering to female, which allows us to color code the bars by sports or NOC instead of breakdown of male/female, is very helpful.

  • Looking at Q4

    • CAN

      • It is hard to know that this project is analyzing Canada. I have not figured out how to add a title to a graph in Virtualitics. Thus, I think the best approach is to make the file structure as simple, clear, and explicit as possible. I believe I have done this, but it may need to be adjusted in the future, if additional classes do not find it intuitive to navigate.

    • FIN

      • The 3D VR immersive experience of Virtualitics helps emphasize the heights of each bar graph column. This made it faster and easier to identify that there are a lot fewer female participants from FIN than in comparison to the previously viewed country (which was CAN in this case). It is helpful that the bar graph heights are not proportionally scaled but instead are based on frequency counts. The raw number/height is more effective for comparisons.

      • When the heights are smaller and differences might be smaller, it is helpful that in the VR view, the participant can zoom in/out and twist around.

      • The most obvious takeaway was that the number of gymnastics participants were dramatically lower than in the overall graphs. FIN's contribution to the presence of women in gymnastics seemed lower than the average, so it emphasized that the comparative high number of gymnastics participants must be the result of a relatively super high number of female gymnastics participation in another country.

        • An interesting comment is that China is not in the top 15 of countries with the highest female participation, so it was not one of the NOCs that were analyzed in a deeper perspective or color-coded. Thus, the impact of its female gymnasts was not carefully analyzed in terms of how it could offset the lower impact from FIN.

        • To see another country that counteracts the lower number of gymnasts from FIN, see below (FRA)

    • FRA

      • See above note about the number of female participants in gymnastics. FRA had a comparatively high (against other sports for FRA and other countries) number of female participants for gymnastics.

Research Questions Visuals

See this subpage for research study question screenshots from Oculus Quest 2

Below is a video briefly showing the VR experience:

Virtualitics VR demo.mp4