Based on Project 1, complete all steps from that first
This tutorial walks through the complete pipeline for visualizing high-dimensional word embeddings as a continuous semantic field in Unity, deployed to the Meta Quest 3 headset. The system renders the same 64 semantically categorized words from Project 1, but rather than individual labeled spheres, the space is filled with colored density fog clouds and gradient flow lines derived from KDE scalar fields. A real-time semantic probe samples field composition at any point the user aims at.
The project compares a 2D flow field plot against the VR semantic field environment on tasks focused on understanding boundary structure, gradient direction, and category transitions — a fundamentally different evaluation than Project 1's similarity judgment and cluster identification tasks.
Apk: Semantic Field and Gradient Flow Word Embeddings
2D plot: 2D Semantic Field Embedding document
Unity 2022.3 LTS or newer with Android Build Support module installed
Meta XR SDK installed via Unity Package Manager (com.meta.xr.sdk.core)
Meta Quest 3 headset with Developer Mode enabled
Android SDK and ADB installed (via Android Studio or winget install Google.PlatformTools)
Python 3.9+ with numpy, scipy, scikit-learn, umap-learn, matplotlib installed
GloVe pretrained embeddings: glove.6B.300d.txt (download from nlp.stanford.edu/projects/glove)
Existing embeddings.json and UMAP coordinate arrays from Project 1 (or regenerate using Project 1 pipeline)
1.1 Starting Point
This project reuses the full Python pipeline from Project 1. The inputs are the UMAP 3D coordinates already computed and stored in output/X_3d_umap.npy, along with words.npy and cats.npy. No changes to the embedding generation or dimensionality reduction steps are needed.
1.2 Building the KDE Scalar Fields
For each of the four categories, a Gaussian kernel density estimate is computed over the 3D UMAP coordinates. This produces a continuous scalar field — a density value at every point in a 3D grid — representing how strongly that category is present at each location in space.
Run: python scripts/build_scalar_field.py
Key parameters:
GRID_RES = 20 — the grid resolution (20×20×20 = 8000 cells). Higher values produce smoother clouds but increase rendering cost significantly
bw_method = 0.5 — the KDE bandwidth, controlling how spread-out each category's density hill is. Increase for smoother, more merged clouds; decrease for tighter, more peaked regions
Output: output/scalar_fields.npy — shape (4, 20, 20, 20), one density grid per category, normalized to [0, 1]
1.3 Computing the Gradient Vector Field
The gradient of each scalar field gives, at every grid point, the direction in which that category's density increases most steeply. This is the mathematical basis for the flow lines: they follow the gradient uphill toward the nearest semantic peak.
Run: python scripts/compute_gradient.py
Uses numpy.gradient on each density grid — a standard numerical differentiation approach identical to how gradients are computed in CFD and physics simulations
Output: output/gradient_fields.npy — shape (4, 3, 20, 20, 20), where axis 1 is the (dx, dy, dz) gradient vector at each grid point
1.4 Exporting field.json
Unity reads a single JSON file containing all four density grids and their gradient components, flattened to 1D arrays for fast loading.
Run: python scripts/export_field_json.py
Output: output/field.json
Copy to Assets/Data/field.json in Unity after generation
1.5 Generating the 2D Flow Field Plot
The 2D plot used in the study shows the same field structure in flat form: colored contour rings outlining each category's density, and black gradient flow arrows across the space. Mystery words (guilt, soldier, power) appear as gray dots labeled Dot A, Dot B, Dot C with no colored background, requiring participants to read the field structure to place them.
Run: python scripts/plot_2d.py
Outputs: output/plot_2d_flow_field.png (labeled), output/plot_2d_flow_field_no_labels.png (unlabeled for study use)
Contour rings are the 2D equivalent of the 3D fog clouds
Black flow arrows are the 2D equivalent of the 3D colored streamlines
2.1 Project Configuration
This project inherits the full Unity setup from Project 1 — the same Android build configuration, Meta XR SDK installation, and OVRCameraRig hierarchy. The only additions are the field rendering system layered on top of the existing word sphere cloud.
If starting fresh, configure:
Platform: Android (File > Build Settings > Switch Platform)
Graphics APIs: Vulkan (primary) + OpenGLES3 (fallback)
Minimum API Level: Android 12.0 (API 32)
Scripting Backend: IL2CPP
Target Architectures: ARM64 only
Color Space: Linear
2.2 Scene Hierarchy
The SemanticField scene extends the Project 1 EmbeddingViz scene with a new SemanticField parent object. The complete hierarchy:
OVRCameraRig — head tracking and controller input. CloudNavigator script for locomotion
EmbeddingCloud — parent of 64 word spheres from Project 1 (kept for spatial reference)
SemanticField — parent of all new field rendering components
FieldLoader script — reads and parses field.json
FieldRenderer script — spawns transparent voxels to form fog clouds
FlowLineRenderer script — integrates and renders gradient streamlines
FieldProbe script — samples field composition at controller aim point
InfoCanvas (child of CenterEyeAnchor) — HUD-locked panel showing semantic composition percentages
LaserPointer — LineRenderer from right controller anchor
LegendAnchor — CategoryLegend script
EventSystem
Directional Light
Note: SemanticField should be a child of EmbeddingCloud at local position (0, 0, 0) so voxels and word spheres share the same coordinate origin.
2.3 Core Scripts
Reads field.json at Awake() and stores the parsed FieldData. Provides SampleDensity(catIdx, normalizedPos) and SampleGradient(catIdx, normalizedPos) methods that convert a normalized [0,1] position to a grid index and return the stored float or Vector3 value. Also provides DominantCategory(normalizedPos) returning the category with highest density at that point.
Iterates over all 20×20×20 grid cells across all four categories. For each cell where density exceeds the threshold (default 0.25), instantiates a transparent sphere voxel at the corresponding world position. Voxel color is set per-category via MaterialPropertyBlock; alpha is scaled by density so denser regions appear more opaque. Scale jitter (0.7–1.1× the grid cell size) and position jitter (insideUnitSphere × 0.3 × step) break up the visible grid pattern. The Y button on the left controller cycles the density threshold between 0.1 and 0.3 for live tuning during a session.
Key Inspector settings:
Density Threshold: 0.25 (raise to thin clouds, lower to fill space)
Voxel Alpha Scale: 0.1–0.2 (keep low so individual voxels are nearly invisible and only overlap creates opacity)
Cloud Size Meters: 2.5 (must match EmbeddingCloud cloudSizeMeters)
FlowLineRenderer.cs
Seeds streamlines by guaranteeing numStreamlines/4 seeds per category (default 5 per category = 20 total), randomly placed within the [0.15, 0.85] normalized range. Each seed is accepted only if its density exceeds minDensityToStart and its gradient magnitude exceeds 0.002. Integration uses RK2 (midpoint method) along the normalized gradient field with direction smoothing — each step's direction is lerped 25% toward the new gradient direction and 75% toward the previous direction, eliminating the zigzag artifact that pure normalized gradient steps produce. Lines are rendered as LineRenderer components with startWidth thick and endWidth thin, with category color fading to transparent at the end. A small capsule prefab is placed at the line endpoint, rotated 90° around X to align with the flow direction, serving as an arrowhead.
Key Inspector settings:
Num Streamlines: 20 (5 per category)
Steps Per Line: 40
Step Size: 0.006 (in normalized [0,1] space)
Min Density To Start: 0.03
Line Width: 0.008 meters
Every frame, computes a sample point at probeDepth (default 3.0m) along the right controller's forward ray. Converts this world position to normalized field coordinates accounting for SemanticField's world position and cloudSize. Calls SampleDensity for each of the four categories, normalizes the values to percentages, and updates the InfoCanvas TextMeshProUGUI with a colored percentage bar display. The panel is HUD-locked — InfoCanvas is a child of CenterEyeAnchor so it always faces the user at top-left of their view.
2.4 Voxel Prefab Setup
Create a sphere primitive → add a Standard material with Rendering Mode set to Transparent → set Albedo to white (runtime color is set per-voxel via MaterialPropertyBlock) → save as Assets/Prefabs/FieldVoxel. Drag into FieldRenderer's Voxel Prefab slot.
2.5 Streamline Material Setup
Create a new material → Shader: Particles/Standard Unlit → Color: white (runtime color set per-line via LineRenderer.startColor and endColor) → save as Assets/Materials/StreamlineMat. Drag into FlowLineRenderer's Streamline Material slot.
2.6 Building and Deploying
adb devices
In Unity: File > Build Settings > Build → save as SemanticField.apk
adb install -r "C:\Users\amsye\Desktop\SemanticField.apk"
Find under Unknown Sources in the Quest App Library.
Voxel count scales as O(G³ × 4) — at grid resolution 30 the scene has ~27,000+ voxels which causes severe frame rate drops on Quest hardware. Grid resolution 20 with threshold 0.25 keeps voxel count under ~3,000 and maintains acceptable performance.
The fog clouds and word spheres share the same coordinate system only when SemanticField is a child of EmbeddingCloud at local position (0,0,0). If either object's Transform is moved independently, clouds and spheres will misalign.
Streamlines are seeded randomly each time GenerateStreamlines() is called — the specific lines visible change between sessions. Setting a fixed Random.seed before the call would make them reproducible.
The density threshold toggle (Y button) calls RenderField() which destroys and reinstantiates all voxel GameObjects — this causes a visible 1-2 second freeze on the Quest. A shader-based approach would eliminate this.
Flow lines in 3D are sparser than the 2D streamplot, which draws arrows across the entire grid. Increasing numStreamlines to 60+ would approximate the 2D density but increases draw call count proportionally.
The probe panel samples field.json grid values at the nearest grid cell rather than trilinear interpolation — values near cell boundaries can jump discontinuously. Implementing trilinear interpolation in SampleDensity would smooth this.