Spring 2026 Project 1 Complete Walkthrough (Code and Technical Output, In-Person Demo)

Section 1 — Prerequisites & Project Setup

Complete all of this section before starting any milestone. It should take roughly 2–3 hours the first time, mostly waiting on downloads.

1.1 — Software to Install

Python:

3.10 or 3.11
https://python.org — verify: python --version

pip packages

latest
pip install torch numpy umap-learn matplotlib scikit-learn tqdm

Unity Hub

latest
https://unity.com/download

Unity Editor

2022.3 LTS
Install via Unity Hub → Installs → Add

Android Build Support

via Unity Hub
Unity Hub → Installs → your version → Add Modules → Android Build Support + NDK + SDK

Meta XR All-in-One SDK

latest
Unity Asset Store: search 'Meta XR All-in-One SDK'

VS Code

latest
https://code.visualstudio.com — add Python + C# extensions

Git LFS

latest
https://git-lfs.com — run: git lfs install

scrcpy (Quest mirror)

latest
https://github.com/Genymobile/scrcpy — mirrors Quest screen to PC

1.2 — Download GloVe Vectors

GloVe (Global Vectors for Word Representation) is the embedding source for this project. The 6B 300-dimension file is 862 MB uncompressed.

# 1. Go to: https://nlp.stanford.edu/projects/glove/

# 2. Download: glove.6B.zip (862 MB)

# 3. Unzip to get: glove.6B.300d.txt

# (other sizes: 50d, 100d, 200d also in the zip — only use 300d)

# Verify the file:

wc -l data/glove.6B.300d.txt

# Expected output: 400000 (400K words)

# Check the first line:

head -1 data/glove.6B.300d.txt

# Expected: 'the 0.418 0.24968 -0.41242 ...' (word followed by 300 floats)

1.3 — Create Project Folder Structure

# Run from wherever you want the project to live:

mkdir word_embeddings_ar

cd word_embeddings_ar

mkdir data scripts output unity_export evaluation wiki

# Place your GloVe file:

# Move/copy glove.6B.300d.txt into word_embeddings_ar/data/

# Final structure should look like this:

word_embeddings_ar/

data/

glove.6B.300d.txt <- GloVe file (862 MB)

scripts/

word_categories.py <- Step 2

load_glove.py <- Step 3

generate_embeddings.py <- Step 4

plot_2d.py <- Step 5

check_similarity.py <- Step 6

export_for_unity.py <- Step 7

output/ <- Generated files go here

unity_export/ <- Copies of files for Unity

evaluation/ <- Survey forms, task sheets

wiki/ <- Wiki page drafts

1.4 — Unity Project Setup (One-Time)

Do this once before the Feb 12 milestone. You will open and build on this project through the entire timeline.

Open Unity Hub. Click New Project.
Select template: 3D (Core). Do NOT pick URP or HDRP — we configure rendering manually.
Project Name: WordEmbeddingsAR
Location: somewhere outside your Python project folder.
Unity version: 2022.3 LTS (critical for Meta XR compatibility).
Click Create Project and wait (~3 min).
In Unity: File → Build Settings → select Android → click Switch Platform. Wait ~5 min.
Create these folders in the Project panel (Assets root): Scripts, Data, Materials, Prefabs, Scenes, Shaders, UI, Audio.
File → Save Scene As → Assets/Scenes/EmbeddingViz.unity.

1.5 — Install Unity Packages

Window → Package Manager. Use the search bar for each:

TextMeshPro
- Unity Registry (search)
- Click Import TMP Essentials when prompted after install
XR Plugin Management
- Unity Registry (search)
- Required for all XR features
OpenXR Plugin
- Unity Registry (search)
- The cross-platform XR backend
Meta XR All-in-One SDK
- My Assets tab (after Asset Store import)
- Brings in OVRCameraRig, Passthrough, Hand Tracking, etc.
Newtonsoft Json (optional)
- Unity Registry (search 'Json')
- Easier JSON parsing than JsonUtility; optional but helpful

After importing Meta XR SDK

A 'Meta XR Configuration' window will pop up. Click 'Fix All' to apply recommended project settings automatically.

This sets Android API level, removes stereo rendering overrides, and enables required permissions.

You may need to restart Unity after this step.

1.6 — Configure XR & Passthrough Project Settings

Edit → Project Settings → XR Plug-in Management → Android tab → check OpenXR.
Click the yellow warning triangle next to OpenXR → resolve any listed issues.
Under OpenXR → Feature Groups → enable: Meta Quest Support, Hand Tracking, Passthrough.
Edit → Project Settings → Player → Android tab:

Company Name and Product Name: fill in your info
Other Settings → Minimum API Level: Android 10 (API 29)
Scripting Backend: IL2CPP
Target Architectures: ARM64 only (uncheck x86)

Still in Player → Other Settings → Color Space: Linear (important for correct colors in passthrough).
Graphics APIs: remove Vulkan if present, keep OpenGLES3 only.

Section 2 — Dataset selection and form creation

2.1 — What Was Completed

Dataset Decision: GloVe 6B 300D

Source

Stanford NLP — https://nlp.stanford.edu/projects/glove/

File

glove.6B.300d.txt

Vocabulary

400,000 words, trained on 6 billion tokens from Wikipedia + Gigaword

Dimensions

300 (standard; good balance of coverage vs. computation)

Why GloVe over word2vec

Pre-trained file format is simpler (plain text); no special loading library needed

Why GloVe over BERT

Static embeddings are easier to visualize; one vector per word, not context-dependent

Why 300d over 50d

More semantic structure preserved; richer spatial geometry in 3D after reduction

Word Categories (Finalized)

Four semantic categories were chosen to maximize visual cluster separation while testing meaningful NLP concepts:

emotions

joy anger fear sadness love hate happiness grief anxiety hope disgust surprise pride shame envy guilt
Red #EE4F4F

professions

doctor nurse teacher engineer lawyer pilot chef scientist artist soldier farmer banker writer judge architect accountant
Blue #45A1E8

moral_concepts

justice fairness freedom authority power truth honor virtue loyalty courage mercy duty rights equality liberty conscience
Green #52C784

nature

mountain river forest ocean sky earth fire wind rain snow desert valley storm sun moon thunder
Yellow #F9C22E

Evaluation Tasks (Finalized)

Cluster Identification

Accuracy + time + confidence (Likert 1-5)
Circle all words you think belong to the same semantic group.

Similarity Judgment

Correctness vs. cosine distance + confidence
Which is closer to justice: fairness or law?

Spatial Reasoning

Rubric: insight depth, spatial language, accuracy
Explain why power is positioned near authority but far from freedom.

Category Membership

Binary correct/wrong
Does doctor belong closer to engineer or nurse?

Word Identification

Accuracy
Name the outlier word in this cluster.

2.2 — Task Design Rationale

Cluster Identification tests spatial understanding at a coarse level — can users see the macro structure? This is where AR's ability to show 3D depth should most clearly help, since 2D plots project the third dimension flat and clusters overlap more.

Similarity Judgment tests fine-grained neighborhood relationships. In AR, users can physically walk toward two words and judge which feels physically closer. In 2D, they rely on pixel distance. AR has a clear advantage here because human spatial judgment is calibrated for real distances, not screen coordinates.

Spatial Reasoning is qualitative and tests whether users build an accurate mental model of the geometry. AR users should be able to reference the physical layout (e.g., 'power is behind and to the left of authority') in a way 2D users cannot.

2.3 — Google Form Survey Design (Build Now, Use FOR DEMO)

Create the evaluation Google Form now so it is ready for pilot testing. Go to forms.google.com and create a new form titled: Word Embedding Visualization Study. Add the following sections:

Section 1: Consent & Background

Name (short answer)
Familiarity with word embeddings: None / Heard of them / Used them / Expert (multiple choice)
How often do you use 3D/VR/AR applications: Never / Monthly / Weekly / Daily (multiple choice)

Section 2: Task Results (fill after 2D condition)

Cluster ID — Which words did you group? (paragraph)
Cluster ID — Confidence: 1 (guessing) to 5 (very confident) (linear scale)
Cluster ID — Time in seconds (short answer number)
Similarity Judgment — justice is closer to: fairness / law (multiple choice)
Similarity Judgment — doctor is closer to: nurse / engineer (multiple choice)
Similarity Judgment — Confidence: 1-5 (linear scale)
Spatial Reasoning response (paragraph)

Section 3: Task Results (fill after AR condition — same questions, new section)

Duplicate Section 2 exactly, label it 'AR Condition'.

Section 4: Comparison Questions

Which visualization helped you identify clusters better: 2D Plot / AR / About the same
Which helped you judge word similarity: 2D Plot / AR / About the same
Rate the 2D plot for overall usefulness: 1-5
Rate the AR visualization for overall usefulness: 1-5
Describe anything about the AR experience that helped or didn't help (paragraph)

Section 5: Open Feedback

What surprised you most about either visualization? (paragraph)
Would you use AR for data exploration in the future: Yes / Maybe / No

Save the form link

After creating the form, click the link icon to get a shareable URL. Save this in your evaluation/ folder as survey_link.txt. You will share this with classmates on Mar 3.

Section 3 — Python Pipeline and Unity Setup

3.1 — word_categories.py

Create this file first. It is imported by every other script and defines the canonical word list and color mapping used in both Python plots and Unity.

# scripts/word_categories.py

# ─────────────────────────────────────────────────────────────

# Canonical word list and color assignments for the project.

# ALL other scripts import from here — never hardcode words elsewhere.

# ─────────────────────────────────────────────────────────────

CATEGORIES = {

'emotions': [

'joy', 'anger', 'fear', 'sadness', 'love', 'hate',

'happiness', 'grief', 'anxiety', 'hope', 'disgust',

'surprise', 'pride', 'shame', 'envy', 'guilt'

],

'professions': [

'doctor', 'nurse', 'teacher', 'engineer', 'lawyer',

'pilot', 'chef', 'scientist', 'artist', 'soldier',

'farmer', 'banker', 'writer', 'judge', 'architect', 'accountant'

],

'moral_concepts': [

'justice', 'fairness', 'freedom', 'authority', 'power',

'truth', 'honor', 'virtue', 'loyalty', 'courage',

'mercy', 'duty', 'rights', 'equality', 'liberty', 'conscience'

],

'nature': [

'mountain', 'river', 'forest', 'ocean', 'sky',

'earth', 'fire', 'wind', 'rain', 'snow',

'desert', 'valley', 'storm', 'sun', 'moon', 'thunder'

],

}

# Flat ordered list of all words (used for matrix construction)

ALL_WORDS = [w for cat in CATEGORIES.values() for w in cat]

# Maps each category to its word set for fast lookup

WORD_TO_CATEGORY = {w: cat for cat, words in CATEGORIES.items() for w in words}

# RGB colors (0.0–1.0) for Unity; hex for matplotlib

CATEGORY_COLORS_UNITY = {

'emotions': (0.93, 0.31, 0.31), # Red

'professions': (0.27, 0.63, 0.91), # Blue

'moral_concepts':(0.32, 0.78, 0.52), # Green

'nature': (0.98, 0.76, 0.15), # Yellow

}

CATEGORY_COLORS_HEX = {

'emotions': '#EE4F4F',

'professions': '#45A1E8',

'moral_concepts':'#52C784',

'nature': '#F9C22E',

}

if __name__ == '__main__':

print(f'Total words: {len(ALL_WORDS)}')

for cat, words in CATEGORIES.items():

print(f' {cat}: {len(words)} words')

# Verify it works:

python scripts/word_categories.py

# Expected:

# Total words: 64

# emotions: 16 words

# professions: 16 words

# moral_concepts: 16 words

# nature: 16 words

3.2 — load_glove.py

This module handles loading the GloVe file. It takes ~30 seconds on first load for the full 400K vocabulary.

# scripts/load_glove.py

# ─────────────────────────────────────────────────────────────

# Loads GloVe vectors from the plain-text file into a Python dict.

# Returns: { word (str) -> np.array of shape (300,) }

# ─────────────────────────────────────────────────────────────

import numpy as np

import os

def load_glove(glove_path='data/glove.6B.300d.txt', vocab=None):

"""

Load GloVe embeddings from text file.

Args:

glove_path: path to glove.6B.300d.txt

vocab: optional set of words to load (loads ALL if None)

Returns:

dict mapping word -> np.array(300,)

"""

if not os.path.exists(glove_path):

raise FileNotFoundError(

f'GloVe file not found at {glove_path}.\n'

f'Download from: https://nlp.stanford.edu/projects/glove/'

)

print(f'Loading GloVe vectors from {glove_path}...')

embeddings = {}

with open(glove_path, 'r', encoding='utf-8') as f:

for i, line in enumerate(f):

parts = line.strip().split()

if len(parts) != 301: # 1 word + 300 floats

continue

word = parts[0]

if vocab is not None and word not in vocab:

continue

embeddings[word] = np.array(parts[1:], dtype=np.float32)

if i % 50000 == 0 and i > 0:

print(f' Loaded {i:,} lines...')

print(f'Done. Loaded {len(embeddings):,} word vectors.')

return embeddings

def get_vectors_for_words(glove, word_list):

"""Extract vectors for a specific word list, reporting missing words."""

found, missing = [], []

vecs = []

for w in word_list:

if w in glove:

found.append(w)

vecs.append(glove[w])

else:

missing.append(w)

print(f' WARNING: "{w}" not found in GloVe vocabulary')

if missing:

print(f'Missing {len(missing)} words: {missing}')

return found, np.array(vecs)

if __name__ == '__main__':

from word_categories import ALL_WORDS

glove = load_glove(vocab=set(ALL_WORDS))

print('king[:5]:', glove.get('king', 'NOT FOUND')[:5])

print('justice[:5]:', glove.get('justice', 'NOT FOUND')[:5])

# Test: loads only your 64 words (much faster than full 400K)

python scripts/load_glove.py

# Expected: Done. Loaded 64 word vectors.

3.3 — generate_embeddings.py

This is the main Python pipeline. It runs PCA and UMAP reductions and exports the JSON file that Unity will consume. Run this script once; the output is stable across runs with the same random_state.

# scripts/generate_embeddings.py

# ─────────────────────────────────────────────────────────────

# Main pipeline: load GloVe -> reduce dimensions -> export JSON + npy

# Runtime: ~2 minutes (UMAP dominates)

# ─────────────────────────────────────────────────────────────

import numpy as np

import json

import os

from sklearn.decomposition import PCA

from sklearn.preprocessing import normalize as sk_normalize

import umap

from load_glove import load_glove, get_vectors_for_words

from word_categories import CATEGORIES, ALL_WORDS, WORD_TO_CATEGORY

from word_categories import CATEGORY_COLORS_UNITY, CATEGORY_COLORS_HEX

OUTPUT_DIR = 'output'

def normalize_to_unit(arr):

"""Scale all coordinates so each axis spans [0, 1]."""

mn = arr.min(axis=0)

mx = arr.max(axis=0)

rng = mx - mn

rng[rng == 0] = 1.0 # Avoid divide-by-zero for degenerate axes

return (arr - mn) / rng

def cosine_sim(a, b):

return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-10))

def main():

os.makedirs(OUTPUT_DIR, exist_ok=True)

# ── 1. Load GloVe ─────────────────────────────────────────

print('=== Step 1: Load GloVe ===')

glove = load_glove(vocab=set(ALL_WORDS))

words, X = get_vectors_for_words(glove, ALL_WORDS)

cats = [WORD_TO_CATEGORY[w] for w in words]

N = len(words)

print(f'Matrix shape: {X.shape}')

# ── 2. Normalize input vectors ────────────────────────────

print('\n=== Step 2: L2-normalize input vectors ===')

X_norm = sk_normalize(X, norm='l2') # Puts all vectors on unit sphere

# ── 3. PCA 2D (for baseline scatter plot) ────────────────

print('\n=== Step 3: PCA 2D ===')

pca2 = PCA(n_components=2, random_state=42)

X_2d = pca2.fit_transform(X_norm)

var2 = pca2.explained_variance_ratio_.sum()

print(f'PCA 2D explained variance: {var2:.1%}')

# ── 4. PCA 3D (Unity fallback) ────────────────────────────

print('\n=== Step 4: PCA 3D ===')

pca3 = PCA(n_components=3, random_state=42)

X_3d_pca = pca3.fit_transform(X_norm)

var3 = pca3.explained_variance_ratio_.sum()

print(f'PCA 3D explained variance: {var3:.1%}')

# ── 5. PCA 50D → UMAP 3D (primary) ───────────────────────

print('\n=== Step 5: PCA 50D intermediate ===')

pca50 = PCA(n_components=min(50, N - 1), random_state=42)

X_50 = pca50.fit_transform(X_norm)

print('\n=== Step 6: UMAP 3D (primary — takes ~2 min) ===')

reducer = umap.UMAP(

n_components=3,

n_neighbors=8,

min_dist=0.25,

metric='cosine',

random_state=42,

verbose=True

)

X_3d_umap = reducer.fit_transform(X_50)

print('UMAP done.')

# ── 6. Normalize all to [0, 1] ────────────────────────────

X_2d_n = normalize_to_unit(X_2d)

X_3d_pca_n = normalize_to_unit(X_3d_pca)

X_3d_umap_n = normalize_to_unit(X_3d_umap)

# ── 7. Build JSON export ──────────────────────────────────

print('\n=== Step 7: Build JSON export ===')

entries = []

for i, w in enumerate(words):

cat = cats[i]

col = CATEGORY_COLORS_UNITY[cat]

entries.append({

'word': w,

'category': cat,

'color': {'r': round(col[0],4), 'g': round(col[1],4), 'b': round(col[2],4)},

'pos_2d': {'x': round(float(X_2d_n[i,0]),6),

'y': round(float(X_2d_n[i,1]),6)},

'pos_3d_umap': {'x': round(float(X_3d_umap_n[i,0]),6),

'y': round(float(X_3d_umap_n[i,1]),6),

'z': round(float(X_3d_umap_n[i,2]),6)},

'pos_3d_pca': {'x': round(float(X_3d_pca_n[i,0]),6),

'y': round(float(X_3d_pca_n[i,1]),6),

'z': round(float(X_3d_pca_n[i,2]),6)},

})

dataset = {

'metadata': {

'total': N,

'categories': list(CATEGORIES.keys()),

'reduction_primary':'umap_cosine',

'pca2d_variance': round(var2, 4),

'pca3d_variance': round(var3, 4),

'umap_params': {

'n_neighbors': 8, 'min_dist': 0.25, 'metric': 'cosine'

},

'words': entries

}

# ── 8. Save outputs ───────────────────────────────────────

json_path = f'{OUTPUT_DIR}/embeddings.json'

with open(json_path, 'w') as f:

json.dump(dataset, f, indent=2)

print(f'Saved: {json_path}')

np.save(f'{OUTPUT_DIR}/X_2d.npy', X_2d_n)

np.save(f'{OUTPUT_DIR}/X_3d_umap.npy', X_3d_umap_n)

np.save(f'{OUTPUT_DIR}/X_3d_pca.npy', X_3d_pca_n)

np.save(f'{OUTPUT_DIR}/X_raw.npy', X_norm)

np.save(f'{OUTPUT_DIR}/words.npy', np.array(words))

np.save(f'{OUTPUT_DIR}/cats.npy', np.array(cats))

print('Saved numpy arrays.')

# ── 9. Print similarity spot-check ───────────────────────

print('\n=== Spot-check: cosine similarities (raw 300D) ===')

wi = {w: i for i, w in enumerate(words)}

pairs = [('justice','fairness'), ('justice','law'),

('doctor','nurse'), ('doctor','engineer')]

for a, b in pairs:

if a in wi and b in wi:

s = cosine_sim(X_norm[wi[a]], X_norm[wi[b]])

print(f' {a} <-> {b}: {s:.4f}')

print('\nAll done! Check output/embeddings.json')

return words, cats, X_2d_n

if __name__ == '__main__':

main()

# Run from project root:

python scripts/generate_embeddings.py

# Expected final lines:

# Saved: output/embeddings.json

# Saved numpy arrays.

# Spot-check: cosine similarities (raw 300D)

# justice <-> fairness: 0.7xxx

# justice <-> law: 0.6xxx

# doctor <-> nurse: 0.8xxx

# doctor <-> engineer: 0.5xxx

# All done! Check output/embeddings.json

UMAP is non-deterministic by default

The random_state=42 argument makes UMAP reproducible. If you change this number or remove it, you will get a different 3D layout each run. Keep random_state=42 throughout the project for consistency across your evaluation conditions.

3.4 — plot_2d.py

Creates the 2D baseline scatter plot that participants will use in the non-AR condition of the evaluation. Save this plot — you will print it or display it on screen during the Mar 3 pilot.

# scripts/plot_2d.py

# ─────────────────────────────────────────────────────────────

# Generates 2D PCA scatter plot for the baseline evaluation condition.

# Run AFTER generate_embeddings.py

# ─────────────────────────────────────────────────────────────

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.patches as mpatches

import os

from word_categories import CATEGORY_COLORS_HEX

def plot_2d(X, words, cats, title, filename, show_labels=True,

figsize=(14, 11), dpi=150):

fig, ax = plt.subplots(figsize=figsize, dpi=dpi)

ax.set_facecolor('#F8F9FA')

fig.patch.set_facecolor('#FFFFFF')

for i, (word, cat) in enumerate(zip(words, cats)):

c = CATEGORY_COLORS_HEX[cat]

ax.scatter(X[i,0], X[i,1], c=c, s=90, alpha=0.88,

zorder=3, edgecolors='white', linewidths=0.6)

if show_labels:

ax.annotate(word, (X[i,0], X[i,1]),

fontsize=8.5, alpha=0.92,

xytext=(5, 5), textcoords='offset points',

fontfamily='DejaVu Sans')

legend = [mpatches.Patch(color=v, label=k.replace('_',' ').title())

for k, v in CATEGORY_COLORS_HEX.items()]

ax.legend(handles=legend, loc='upper left', framealpha=0.95,

fontsize=10, title='Category', title_fontsize=10)

ax.set_title(title, fontsize=14, fontweight='bold', pad=16)

ax.set_xlabel('PCA Dimension 1', fontsize=11)

ax.set_ylabel('PCA Dimension 2', fontsize=11)

ax.grid(True, alpha=0.3, zorder=0, linestyle='--')

ax.set_xlim(-0.05, 1.05)

ax.set_ylim(-0.05, 1.05)

plt.tight_layout()

os.makedirs('output', exist_ok=True)

plt.savefig(f'output/{filename}', dpi=dpi, bbox_inches='tight')

print(f'Saved: output/{filename}')

plt.close()

def plot_similarity_comparison(glove_path=None):

"""Creates a separate plot showing similarity relationships for Task 2."""

X = np.load('output/X_2d.npy')

words = list(np.load('output/words.npy'))

cats = list(np.load('output/cats.npy'))

wi = {w: i for i, w in enumerate(words)}

fig, ax = plt.subplots(figsize=(10, 8), dpi=150)

ax.set_facecolor('#F0F4FF')

# Highlight the Task 2 triplets

task2_groups = [

('justice', 'fairness', 'law', 'Task 2A'),

('doctor', 'nurse', 'engineer', 'Task 2B'),

]

from word_categories import CATEGORY_COLORS_HEX

for tgt, close, far, label in task2_groups:

for w in [tgt, close, far]:

if w in wi:

cat = cats[wi[w]]

ax.scatter(X[wi[w],0], X[wi[w],1],

c=CATEGORY_COLORS_HEX[cat], s=180,

edgecolors='black', linewidths=1.5, zorder=5)

ax.annotate(w, (X[wi[w],0], X[wi[w],1]),

fontsize=11, fontweight='bold',

xytext=(7, 7), textcoords='offset points')

# Draw lines showing distances

if tgt in wi and close in wi:

ax.plot([X[wi[tgt],0], X[wi[close],0]],

[X[wi[tgt],1], X[wi[close],1]],

'g--', alpha=0.6, linewidth=1.5, label=f'{tgt}-{close}')

if tgt in wi and far in wi:

ax.plot([X[wi[tgt],0], X[wi[far],0]],

[X[wi[tgt],1], X[wi[far],1]],

'r--', alpha=0.6, linewidth=1.5, label=f'{tgt}-{far}')

ax.set_title('Similarity Judgment Task — Target Word Distances', fontsize=13, pad=14)

ax.legend(fontsize=9)

ax.grid(True, alpha=0.3)

plt.tight_layout()

plt.savefig('output/plot_task2_similarity.png', dpi=150, bbox_inches='tight')

print('Saved: output/plot_task2_similarity.png')

plt.close()

if __name__ == '__main__':

X = np.load('output/X_2d.npy')

words = list(np.load('output/words.npy'))

cats = list(np.load('output/cats.npy'))

# Main labeled plot (for participants)

plot_2d(X, words, cats,

title='Word Embeddings — PCA 2D (Participant View)',

filename='plot_pca_2d_labeled.png',

show_labels=True)

# Unlabeled version (for cluster ID task — labels revealed after)

plot_2d(X, words, cats,

title='Word Embeddings — PCA 2D (Identify Clusters)',

filename='plot_pca_2d_no_labels.png',

show_labels=False)

# Similarity task helper

plot_similarity_comparison()

print('All plots saved to output/')

python scripts/plot_2d.py

# Saves:

# output/plot_pca_2d_labeled.png

# output/plot_pca_2d_no_labels.png

# output/plot_task2_similarity.png

3.5 — check_similarity.py

Utility script for computing ground-truth answers to Task 2 similarity judgment questions. Run this to generate the answer key.

# scripts/check_similarity.py

# ─────────────────────────────────────────────────────────────

# Compute cosine similarity between words in the original 300D space.

# Usage: python scripts/check_similarity.py <target> <word1> <word2> ...

# Example: python scripts/check_similarity.py justice fairness law

# ─────────────────────────────────────────────────────────────

import sys

import numpy as np

from numpy.linalg import norm

from load_glove import load_glove

def cosine(a, b):

return float(np.dot(a, b) / (norm(a) * norm(b) + 1e-10))

if __name__ == '__main__':

if len(sys.argv) < 3:

print('Usage: python check_similarity.py <target> <word1> [word2] ...')

sys.exit(1)

target = sys.argv[1]

compare_words = sys.argv[2:]

all_words = [target] + compare_words

glove = load_glove(vocab=set(all_words))

if target not in glove:

print(f'ERROR: "{target}" not in GloVe'); sys.exit(1)

print(f'\nCosine similarity to "{target}": (higher = more similar)')

results = []

for w in compare_words:

if w in glove:

s = cosine(glove[target], glove[w])

results.append((w, s))

else:

print(f' WARNING: "{w}" not in GloVe')

results.sort(key=lambda x: -x[1])

for w, s in results:

bar = '#' * int(s * 40)

print(f' {w:20s}: {s:.4f} {bar}')

if results:

closest = results[0][0]

print(f'\n --> Closest to "{target}" is: "{closest}"')

# Generate answer key for all evaluation tasks:

python scripts/check_similarity.py justice fairness law

python scripts/check_similarity.py doctor nurse engineer

python scripts/check_similarity.py power authority freedom

# Save output to evaluation/answer_key.txt:

python scripts/check_similarity.py justice fairness law >> evaluation/answer_key.txt

python scripts/check_similarity.py doctor nurse engineer >> evaluation/answer_key.txt

python scripts/check_similarity.py power authority freedom >> evaluation/answer_key.txt

3.6 — Verify JSON Structure

Before moving to Unity, confirm the JSON is valid and complete:

# Quick JSON validation:

python -c "

import json

with open('output/embeddings.json') as f:

d = json.load(f)

print('Total words:', d['metadata']['total'])

print('Categories:', d['metadata']['categories'])

print('PCA 2D variance:', d['metadata']['pca2d_variance'])

print('PCA 3D variance:', d['metadata']['pca3d_variance'])

print()

print('First entry:')

import pprint; pprint.pprint(d['words'][0])

"

# Expected output:

# Total words: 64

# Categories: ['emotions', 'professions', 'moral_concepts', 'nature']

# PCA 2D variance: 0.29xx

# First entry:

# {'category': 'emotions',

# 'color': {'b': 0.31, 'g': 0.31, 'r': 0.93},

# 'pos_2d': {'x': 0.xxxxxx, 'y': 0.xxxxxx},

# 'pos_3d_pca': {'x': ..., 'y': ..., 'z': ...},

# 'pos_3d_umap': {'x': ..., 'y': ..., 'z': ...},

# 'word': 'joy'}

3.7 — Copy embeddings.json to Unity

Now that the Python pipeline is working, copy the output to Unity:

# macOS/Linux:

cp output/embeddings.json /path/to/WordEmbeddingsAR/Assets/Data/embeddings.json

# Windows (PowerShell):

Copy-Item output\embeddings.json 'C:\path\to\WordEmbeddingsAR\Assets\Data\embeddings.json'

# Also keep a copy in unity_export/ for reference:

cp output/embeddings.json unity_export/embeddings.json

After copying, click inside the Unity editor window — it will auto-detect and import the new file.

Section 4 — 3D Point Cloud + Interaction

4.1 — WordData.cs (Data Classes)

Create Assets/Scripts/WordData.cs. These classes mirror the JSON structure exactly so Unity's JsonUtility can deserialize the file automatically.

// Assets/Scripts/WordData.cs

// ─────────────────────────────────────────────────────────────

// Data classes that match the embeddings.json structure.

// Must exactly mirror the JSON field names.

// ─────────────────────────────────────────────────────────────

using System;

using System.Collections.Generic;

[Serializable]

public class ColorData {

public float r, g, b;

public UnityEngine.Color ToUnityColor(float a = 1f) =>

new UnityEngine.Color(r, g, b, a);

}

[Serializable]

public class Vec2Data {

public float x, y;

public UnityEngine.Vector2 ToVector2() => new UnityEngine.Vector2(x, y);

}

[Serializable]

public class Vec3Data {

public float x, y, z;

public UnityEngine.Vector3 ToVector3() => new UnityEngine.Vector3(x, y, z);

}

[Serializable]

public class UmapParams {

public int n_neighbors;

public float min_dist;

public string metric;

}

[Serializable]

public class EmbeddingMetadata {

public int total;

public string[] categories;

public string reduction_primary;

public float pca2d_variance;

public float pca3d_variance;

public UmapParams umap_params;

}

[Serializable]

public class WordEntry {

public string word;

public string category;

public ColorData color;

public Vec2Data pos_2d;

public Vec3Data pos_3d_umap;

public Vec3Data pos_3d_pca;

}

[Serializable]

public class EmbeddingDataset {

public EmbeddingMetadata metadata;

public List<WordEntry> words;

}

4.2 — WordPoint.cs (Per-Word Behavior)

Create Assets/Scripts/WordPoint.cs. Each sphere in the scene has this script attached.

// Assets/Scripts/WordPoint.cs

// ─────────────────────────────────────────────────────────────

// Attached to every word sphere. Handles color, highlight,

// selection, billboard label, and proximity glow.

// ─────────────────────────────────────────────────────────────

using UnityEngine;

using TMPro;

public class WordPoint : MonoBehaviour

{

[Header("Visual References")]

public Renderer sphereRenderer;

public TextMeshPro label;

public GameObject selectionRing; // Optional highlight ring object

[Header("Behavior")]

public float baseScale = 0.05f;

public float selectedScale = 0.08f;

public float hoverScale = 0.065f;

// ── Public read-only properties ──────────────────────────

public string Word { get; private set; }

public string Category { get; private set; }

public Color BaseColor{ get; private set; }

// ── Private state ─────────────────────────────────────────

private bool _selected = false;

private bool _hovered = false;

private MaterialPropertyBlock _mpb;

// ── Initialization ────────────────────────────────────────

public void Initialize(string word, string category, Color color)

{

Word = word;

Category = category;

BaseColor = color;

if (label != null)

{

label.text = word;

label.color = Color.white;

label.fontSize = 0.25f;

}

_mpb = new MaterialPropertyBlock();

ApplyColor(color);

transform.localScale = Vector3.one * baseScale;

if (selectionRing != null)

selectionRing.SetActive(false);

}

// ── Update: billboard ─────────────────────────────────────

void Update()

{

if (label != null && Camera.main != null)

{

// Make label face camera but stay upright

Vector3 dir = Camera.main.transform.position - label.transform.position;

dir.y = 0; // Lock Y so labels don't tilt

if (dir != Vector3.zero)

label.transform.rotation = Quaternion.LookRotation(-dir, Vector3.up);

}

// ── Public API ────────────────────────────────────────────

public void SetHovered(bool hovered)

{

if (_selected) return; // Don't override selection

_hovered = hovered;

if (hovered)

{

ApplyColor(Color.Lerp(BaseColor, Color.white, 0.45f));

transform.localScale = Vector3.one * hoverScale;

}

else

{

ApplyColor(BaseColor);

transform.localScale = Vector3.one * baseScale;

}

public void SetSelected(bool selected)

{

_selected = selected;

if (selected)

{

ApplyColor(Color.white);

transform.localScale = Vector3.one * selectedScale;

if (selectionRing != null) selectionRing.SetActive(true);

}

else

{

ApplyColor(BaseColor);

transform.localScale = Vector3.one * baseScale;

if (selectionRing != null) selectionRing.SetActive(false);

}

public void SetCategoryHighlight(bool highlighted)

{

if (_selected) return;

ApplyColor(highlighted

? Color.Lerp(BaseColor, Color.white, 0.3f)

: BaseColor);

float dimFactor = highlighted ? 1.0f : 0.4f;

Color dimColor = new Color(BaseColor.r * dimFactor,

BaseColor.g * dimFactor,

BaseColor.b * dimFactor);

ApplyColor(highlighted ? Color.Lerp(BaseColor, Color.white, 0.3f) : dimColor);

}

public void ResetVisual()

{

_selected = false;

_hovered = false;

ApplyColor(BaseColor);

transform.localScale = Vector3.one * baseScale;

if (selectionRing != null) selectionRing.SetActive(false);

}

// ── Private helpers ───────────────────────────────────────

void ApplyColor(Color c)

{

if (sphereRenderer == null) return;

_mpb.SetColor("_Color", c);

sphereRenderer.SetPropertyBlock(_mpb);

}

// ── Physics trigger for proximity effects ─────────────────

void OnTriggerEnter(Collider other)

{

// Called if another word point's collider enters range

// (used for proximity-based cluster highlighting in AR)

}

4.3 — EmbeddingLoader.cs

Create Assets/Scripts/EmbeddingLoader.cs. This is the main scene manager — it reads the JSON and instantiates all word spheres.

// Assets/Scripts/EmbeddingLoader.cs

// ─────────────────────────────────────────────────────────────

// Reads embeddings.json, spawns WordPoint prefabs, and provides

// category-level operations for the interaction system.

// ─────────────────────────────────────────────────────────────

using UnityEngine;

using System.Collections.Generic;

using System.Linq;

public class EmbeddingLoader : MonoBehaviour

{

[Header("Data")]

public TextAsset embeddingsJson;

[Header("Prefabs")]

public GameObject wordPointPrefab;

[Header("Layout Settings")]

[Tooltip("World-space size of the full cloud in meters")]

public float cloudSizeMeters = 2.5f;

[Tooltip("Use UMAP layout (true) or PCA layout (false)")]

public bool useUMAP = true;

[Header("Category Filter")]

public bool showEmotions = true;

public bool showProfessions = true;

public bool showMoralConcepts = true;

public bool showNature = true;

// ── Public state ──────────────────────────────────────────

public EmbeddingDataset Dataset { get; private set; }

public List<WordPoint> AllPoints { get; private set; } = new();

public List<WordPoint> ActivePoints => AllPoints.Where(p => p.gameObject.activeSelf).ToList();

void Start() => LoadAndSpawn();

// ── Load & Spawn ──────────────────────────────────────────

public void LoadAndSpawn()

{

if (embeddingsJson == null)

{ Debug.LogError("EmbeddingLoader: no JSON asset assigned!"); return; }

Dataset = JsonUtility.FromJson<EmbeddingDataset>(embeddingsJson.text);

Debug.Log($"Loaded {Dataset.words.Count} words | UMAP={useUMAP}");

// Clear any previously spawned points

foreach (Transform child in transform)

Destroy(child.gameObject);

AllPoints.Clear();

foreach (var entry in Dataset.words)

{

if (!IsCategoryVisible(entry.category)) continue;

// Choose 3D position source

Vector3 localPos = useUMAP

? entry.pos_3d_umap.ToVector3()

: entry.pos_3d_pca.ToVector3();

// Scale [0,1] coordinates to world-space cloud size

// Center on origin: shift by -0.5 before scaling

Vector3 worldPos = (localPos - Vector3.one * 0.5f) * cloudSizeMeters;

GameObject go = Instantiate(wordPointPrefab, transform);

go.name = entry.word;

go.transform.localPosition = worldPos;

WordPoint wp = go.GetComponent<WordPoint>();

wp.Initialize(entry.word, entry.category, entry.color.ToUnityColor());

AllPoints.Add(wp);

}

Debug.Log($"Spawned {AllPoints.Count} word points.");

}

// ── Category operations ───────────────────────────────────

public void HighlightCategory(string category)

{

foreach (var wp in AllPoints)

wp.SetCategoryHighlight(wp.Category == category);

}

public void ClearAllHighlights()

{

foreach (var wp in AllPoints)

wp.ResetVisual();

}

public List<WordPoint> GetCategory(string category) =>

AllPoints.Where(p => p.Category == category).ToList();

public WordPoint GetWordPoint(string word) =>

AllPoints.FirstOrDefault(p => p.Word == word);

// ── Visibility toggle (called from UI) ───────────────────

public void ToggleCategory(string category, bool visible)

{

foreach (var wp in AllPoints)

if (wp.Category == category)

wp.gameObject.SetActive(visible);

}

bool IsCategoryVisible(string cat) => cat switch

{

"emotions" => showEmotions,

"professions" => showProfessions,

"moral_concepts" => showMoralConcepts,

"nature" => showNature,

_ => true

};

// ── Runtime layout switch ─────────────────────────────────

public void SwitchLayout(bool umap)

{

useUMAP = umap;

LoadAndSpawn(); // Re-spawn with new positions

}

4.4 — InteractionManager.cs

Create Assets/Scripts/InteractionManager.cs. This handles all user input — gaze, controller ray, selection, and the info panel.

// Assets/Scripts/InteractionManager.cs

// ─────────────────────────────────────────────────────────────

// Handles ray-based interaction: hover, selection, category highlight.

// Works with both mouse (editor) and Quest controllers (device).

// ─────────────────────────────────────────────────────────────

using UnityEngine;

using TMPro;

using System.Collections.Generic;

public class InteractionManager : MonoBehaviour

{

[Header("References")]

public EmbeddingLoader loader;

public TextMeshProUGUI infoPanel;

public LineRenderer laserLine;

[Header("Settings")]

public float maxRayDistance = 15f;

public LayerMask wordPointLayer;

// ── Private state ─────────────────────────────────────────

private WordPoint _hovered = null;

private WordPoint _selected = null;

private bool _usingController = false;

void Update()

{

// Detect if a controller is connected

_usingController = OVRInput.GetConnectedControllers() != OVRInput.Controller.None;

Vector3 origin, direction;

GetRayOriginAndDirection(out origin, out direction);

UpdateLaser(origin, direction);

HandleRaycast(origin, direction);

HandleSelectionInput();

}

// ── Ray source: controller or gaze ───────────────────────

void GetRayOriginAndDirection(out Vector3 origin, out Vector3 direction)

{

if (_usingController)

{

origin = OVRInput.GetLocalControllerPosition(OVRInput.Controller.RTouch);

direction = OVRInput.GetLocalControllerRotation(OVRInput.Controller.RTouch) * Vector3.forward;

}

else

{

origin = Camera.main.transform.position;

direction = Camera.main.transform.forward;

}

// ── Hover ─────────────────────────────────────────────────

void HandleRaycast(Vector3 origin, Vector3 direction)

{

Ray ray = new Ray(origin, direction);

RaycastHit hit;

if (Physics.Raycast(ray, out hit, maxRayDistance, wordPointLayer))

{

WordPoint wp = hit.collider.GetComponent<WordPoint>();

if (wp != null && wp != _hovered)

{

if (_hovered != null && _hovered != _selected)

_hovered.SetHovered(false);

_hovered = wp;

wp.SetHovered(true);

ShowInfo(wp);

}

else if (_hovered != null)

{

if (_hovered != _selected)

_hovered.SetHovered(false);

_hovered = null;

if (_selected == null) ClearInfo();

}

// ── Selection (trigger press or mouse click) ─────────────

void HandleSelectionInput()

{

bool selectPressed = _usingController

? OVRInput.GetDown(OVRInput.Button.PrimaryIndexTrigger, OVRInput.Controller.RTouch)

: Input.GetMouseButtonDown(0);

bool clearPressed = _usingController

? OVRInput.GetDown(OVRInput.Button.Two, OVRInput.Controller.RTouch)

: Input.GetMouseButtonDown(1);

if (selectPressed && _hovered != null)

SelectWord(_hovered);

if (clearPressed)

ClearSelection();

}

// ── Selection logic ───────────────────────────────────────

public void SelectWord(WordPoint wp)

{

if (_selected != null && _selected != wp)

_selected.SetSelected(false);

if (_selected == wp) // Toggle off

{

wp.SetSelected(false);

_selected = null;

loader.ClearAllHighlights();

ClearInfo();

return;

}

_selected = wp;

wp.SetSelected(true);

loader.HighlightCategory(wp.Category);

ShowInfo(wp);

}

public void ClearSelection()

{

if (_selected != null) _selected.SetSelected(false);

_selected = null;

loader.ClearAllHighlights();

ClearInfo();

}

// ── Laser visual ──────────────────────────────────────────

void UpdateLaser(Vector3 origin, Vector3 direction)

{

if (laserLine == null || !_usingController) return;

laserLine.enabled = true;

RaycastHit hit;

Vector3 end = Physics.Raycast(origin, direction, out hit, maxRayDistance, wordPointLayer)

? hit.point

: origin + direction * maxRayDistance;

laserLine.SetPosition(0, origin);

laserLine.SetPosition(1, end);

}

// ── Info panel ────────────────────────────────────────────

void ShowInfo(WordPoint wp)

{

if (infoPanel == null) return;

string catDisplay = wp.Category.Replace('_', ' ');

infoPanel.text = $"<b><size=28>{wp.Word}</size></b>\n" +

$"<size=20>Category: {catDisplay}\n" +

$"Trigger: select | B: clear</size>";

}

void ClearInfo()

{

if (infoPanel != null)

infoPanel.text = "<size=20>Point controller at a word sphere\nand pull trigger to select.</size>";

}

4.5 — CategoryLegend.cs

Create Assets/Scripts/CategoryLegend.cs for the floating color legend:

// Assets/Scripts/CategoryLegend.cs

using UnityEngine;

using TMPro;

public class CategoryLegend : MonoBehaviour

{

public TextMeshPro legendText;

public EmbeddingLoader loader;

private bool[] _visible = { true, true, true, true };

private readonly string[] _categories =

{ "emotions", "professions", "moral_concepts", "nature" };

private readonly string[] _labels =

{ "Emotions", "Professions", "Moral Concepts", "Nature" };

private readonly string[] _colors =

{ "#EE4F4F", "#45A1E8", "#52C784", "#F9C22E" };

void Start() => RebuildText();

void Update()

{

if (Camera.main != null)

transform.LookAt(Camera.main.transform);

}

void RebuildText()

{

if (legendText == null) return;

System.Text.StringBuilder sb = new System.Text.StringBuilder();

sb.AppendLine("<b>Categories</b>");

for (int i = 0; i < _categories.Length; i++)

{

string alpha = _visible[i] ? "FF" : "44";

sb.AppendLine($"<color={_colors[i]}{alpha}> {_labels[i]}</color>");

}

legendText.text = sb.ToString().TrimEnd();

}

// Call from UI button to toggle category visibility

public void ToggleCategory(int index)

{

if (index < 0 || index >= _categories.Length) return;

_visible[index] = !_visible[index];

loader?.ToggleCategory(_categories[index], _visible[index]);

RebuildText();

}

4.6 — Building the WordPoint Prefab (Step-by-Step)

Follow each numbered step precisely. This is the most detail-oriented part of the Unity setup.

In the Hierarchy panel: right-click → 3D Object → Sphere. Rename it to WordPoint.
With WordPoint selected, in the Inspector set Transform Scale to: X=0.05, Y=0.05, Z=0.05.
In the Inspector, click the sphere's default material → find it in Assets/Materials. Rename it WordPointMat.
Set WordPointMat properties: Rendering Mode → Transparent, Albedo → white, Metallic → 0.1, Smoothness → 0.7.
Add a new Layer called 'WordPoints': Edit → Project Settings → Tags and Layers → add 'WordPoints' in Layers.
Set the WordPoint sphere's Layer to WordPoints (Inspector → Layer dropdown).
With WordPoint selected: Add Component → Word Point (the script from Assets/Scripts/WordPoint.cs).
Right-click WordPoint in Hierarchy → Create Empty Child. Name the child Label.
Select Label → Add Component → TextMeshPro - Text (3D).
In the TextMeshPro component: Font Size = 0.25, Alignment = Center Middle, Color = white, Text Wrapping = disabled.
Set Label's local position: X=0, Y=0.09, Z=0 (floats just above the sphere).
Select the WordPoint parent sphere again. In the WordPoint script component: drag the sphere's MeshRenderer into 'Sphere Renderer', drag Label into 'Label'.
(Optional) For the selection ring: Hierarchy → right-click WordPoint → 3D Object → Cylinder. Rename it SelectionRing. Scale: 0.06, 0.002, 0.06. Add a material with Emission enabled (bright white). Drag into 'Selection Ring' field on WordPoint script.
Test: Press Play. In the Game view, click on the sphere. It should turn white and scale up.
Drag the WordPoint from the Hierarchy into Assets/Prefabs/ to create the prefab.
Delete the WordPoint from the Hierarchy — we will spawn it programmatically.

Layer mask on InteractionManager

After creating the 'WordPoints' layer, set the 'Word Point Layer' field on InteractionManager to 'WordPoints'. This makes raycasts only hit word spheres, not other objects in the scene (floor, walls, etc.).

4.7 — Scene Setup: EmbeddingCloud GameObject

Hierarchy → Create Empty → rename to EmbeddingCloud.
Set position: X=0, Y=1, Z=2 (1m above floor, 2m in front of camera start position).
Add Component → Embedding Loader.
In the EmbeddingLoader component: drag Assets/Data/embeddings.json into 'Embeddings Json'.
Drag Assets/Prefabs/WordPoint into 'Word Point Prefab'.
Set Cloud Size Meters to 2.5, Use UMAP to checked.
All category checkboxes checked.

4.8 — Scene Setup: InteractionManager

Hierarchy → Create Empty → rename to InteractionManager.
Add Component → Interaction Manager.
Drag EmbeddingCloud into the 'Loader' field.
Set Word Point Layer to 'WordPoints'.
Max Ray Distance: 15.

4.9 — Scene Setup: World-Space Info Panel

Hierarchy → UI → Canvas → rename to InfoCanvas.
Canvas component: Render Mode → World Space.
RectTransform: Width=500, Height=180.
Transform: Position=(0, 1.8, 1.3), Scale=(0.003, 0.003, 0.003).
Add a UI/Image child: color=(0, 0, 0, 0.7). Anchor stretched to fill.
Add a UI/Text - TextMeshPro child: rename to InfoText. Font size 22. Color white. Overflow: Ellipsis.
Drag InfoText into InteractionManager → 'Info Panel' field.

4.10 — Scene Setup: Color Legend

Hierarchy → Create Empty → rename to LegendAnchor. Position: (1.5, 1.4, 2).
Right-click LegendAnchor → 3D Object → TextMeshPro Surface Text (via Add Component → TextMeshPro Text 3D).
Add Component → Category Legend. Drag EmbeddingLoader into Loader field.
Set font size 0.22, color white.

4.11 — Set Up Laser Pointer

Hierarchy → Create Empty → rename to LaserPointer.
Add Component → Line Renderer.
In Line Renderer: Positions size=2, Width Curve = 0.004 → 0.001 (tapers to tip).
Material: create a new material called LaserMat. Shader: Unlit/Color. Color: white.
Assign LaserMat to Line Renderer material.
Drag the Line Renderer into InteractionManager → 'Laser Line' field.

4.12 — Test in Unity Editor

Before building to device, verify everything in the editor:

Press Play. Console should show: 'Loaded 64 words | UMAP=True' and 'Spawned 64 word points.'
64 colored spheres should appear in a 3D cloud in the scene view.
Hover the mouse over a sphere (in Game view) — it should brighten.
Click a sphere — it turns white and same-category spheres brighten.
Click again — deselects and all colors reset.
The InfoCanvas shows the word name and category on hover.
The legend shows all four categories in their correct colors.

If no spheres appear

1. Check the Console for errors. Most common: 'no JSON asset assigned' — drag embeddings.json into the EmbeddingLoader component.

2. Check that WordPoint prefab is assigned in EmbeddingLoader.

3. Check EmbeddingCloud transform — if position is far from camera, spheres may be out of view. Try position (0, 0, 0) to start.

4. If you see NullReferenceException on WordPoint.Initialize, check that both Sphere Renderer and Label are assigned in the prefab.

Section 5 — Finalized Google FOrm and APK

Google Form: https://forms.gle/SByHy5VjcnpzgsPv8

2D Graphs: https://docs.google.com/document/d/1g1xE_BeGBtsN0BV5dNABPlCcpeQIqP3WPiprV1rXv5k/edit?tab=t.0

Apk: WordEmbeddings

Section 6 — Results

Responses: https://docs.google.com/spreadsheets/d/1g4cxgcjpdEynabruNfXNwCbAPKlRgZZ_o1Uepa-Gsig/edit?usp=sharing

Section 7 — What's Been Learned

Embeddings Encode Surprising Semantic Structure

"doctor" is much closer to "nurse" (0.84) than "engineer" (0.52) — cosine similarity reveals non-obvious relationships.

VR Presence Changes Spatial Intuition

Physically moving between clusters may fundamentally change how participants encode semantic relationships.

Dimensionality Reduction: Lossy is OK

~29% variance explained sounds bad, but perceptual cluster separation is what matters for user comprehension.

Ability to Move in VR adds to Data Understanding

The ability for users to move around in VR allowed them to see data at different angles, gaining a deeper understanding.

Low Variance Explained ≠ Bad Visualization

PCA only captures ~29% variance, yet clusters are still visually meaningful — layout matters more than variance.