For this semester, we'll be focusing on finding and evaluating software for exploratory scientific visualization. The data types below can be used in your project and tutorials. Please add more data types and examples as you find them. Be sure to document any data you add to this page in your journal.
For an abbreviated list of common data types, a brief definition of each type, and recommended/feasible software to use with each type, please see the Software Recommendations by Data Type page.
Please note: many of the following files are very large (we're talking up to around a hundred MB for some of these)! Additionally, this page is a continual work in progress. The categories of data are not exhaustive and are listed in no partincular order, so be prepared for the page to be somewhat messy.
Description: 2D array of data values that can be reduced to an RGB image before viewing or interactively during viewing.
Data:
Simple Animals (from Felice Frankel at MIT)
Cell Image Library - 2D images of many different types of cells
Electron Microscopy Public Image Archive (EMPIAR)
Lots of 2D microscopy data
Description: A magnitude and direction at each more in 3D space. Can be 3D flow. Can be time-varying.
Data:
Vibrating Cylinder Flow - George Karniadakis and Zhicheng Wang
Contains 10 .plt files each of which contains 12 components of data:
(x, y, z) position coordinates
(u, v, w) velocity components in the x, y, z directions
p is pressure
vorticity_x, vorticity_y, vorticity_z
Q, second invariant of the velocity gradient
Building Downwash Simulations
Turbulence Decay
Update: Link no longer works
Description: data that represents a collection of points in 2D or 3D space.
Data:
las files: Lidar data comes comes in many forms, however, las files are one of the most popular Lidar data formats. An las file is a binary file which contains point cloud data, stored as X, Y, Z coordinates, and a header which contains file metadata, RGB information, GIS information, and many other optional fields. For more detailed information on las files, see the official las specification.
Ecology data from EEB Professor Jim Kellner and postdoc Loren Albert:
More information
Data
Update: Links no longer work
Sample NEONDS Sample LiDAR Point Cloud Data:
Sample Velodyne .pcap files:
City of Montreal, Canada LiDAR
http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015 (note that this page is written in French)
Point Cloud Visualization Software:
Point Cloud Processing Software:
Comparisons:
Tutorials:
Tutorial on Python-PCL
Tutorial on Converting .las files to .out files: Laspy Tutorial
Tutorial on Lidar Paraview to Blender
Tutorial on viewing .pcap files in Veloview.
Papers:
Summaries of papers on processing and rendering LiDAR data.
Description: This might be an amalgam of different data types, including collections of stars, collections of galaxies, outward-looking imagery (as from a telescope), or inward looking imagery (as from a satellite of a planet). Some such data is 2D imaging data, but with a particular underlying space, e.g., Earth, the moon, or Mars.
Data:
NASA Global Imagery Browser Services (GIBS):
GIBS is a database of satellite imagery data collected by NASA JPL; the database supports an extensive REST API, however, it can be somewhat complicated to use.
GIBS returns satellite data in the form of tiles or maps (in Mercator projection)
Earthdata is a massive database of atmospheric, land, and ocean data operated by NASA
Software:
OpenSpace software views some kinds of data of these types, and it has the potential to generalize to more. It almost runs in the Yurt and does run on desktops and HMD's.
Description: MRI imaging data comes in many forms, often depending on the type of MRI machine utilized when scanning a patient. Two of the most common MRI data formats are NIfTi and DICOM. Luckily, most MRI processing software can easily convert between these two formats, so we will focus on visualizing and processing only NIfTi files. For an in-depth comparison between NIfTi and DICOM, see Medical Imaging Formats.
Data:
Brain Tumor Segmentation Challenge (BRATS) - BRATS is a comprehensive MRI brain tumor data set comprised of 243 MRI scans, expertly labeled based on tumor pathology.
Alzheimer's Disease Neuroimaging Initiative (ADNI) - ADNI contains a collection of over 3000 MRI scans of Alzheimer's patients, captured over several years and with varying MRI scanners.
Note that both the BRATS and ADNI data sets are not public and must be requested.
OpenNeuro is a free and open platform allows users to share data, get data, and use data that is BIDS-compliant MRI, PET, MEG, EEG, and iEEG data. The BIDS stands for the Brain Imaging Data Structure, the emerging standard for the organization of neuroimaging data.
Software:
Papers:
Human brain functional MRI and DTI visualization with virtual reality: an in-depth overview of VR visualization of MRI data.
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS): an overview of the BRATS data set.
Medical Imaging Formats: a comprehensive comparison of the most popular MRI data formats.
Similar to:
CLARITY Brain Imaging
CT Brain imaging
Data:
Twitter API
Probably the easiest social media API to use; clearly defined data types and properties (just Tweets associated with users and their info)
Access Tweets in realtime; filter by many different properties
Access previous Tweets for other types of analysis
Apply for Twitter Developer tools here
[2023 Feb. Update Regarding Twitter API]
Starting February 9, Twitter ended its free access offering. With Twitter API ver.2, we will get 3 tiers (one coming soon)
The price for these tiers are TBD according to Twitter's official documentation as of 2023.2.24
Essential Plan will offer
1 project
1 app per project
retrieving up to 500k Tweets per month
Access to Twitter API v2 and standard v1.1
No access to Twitter API premium v1.1 and enterprise
Elevated Plan will offer
1 project
3 apps per project
retrieving up to 3 million Tweets per month
Access to API standard 1.1, premium v1.1, enterprise
Academic Research
1 project
1 App per project
retrieve up to 10 million Tweets per month
Access to API standard 1.1, premium v1.1, enterprise
Details on these access levels can be found here
2012 US election tweets - Microsoft Research
Daily geolocated tweets - Microsoft Research
Citation network dataset
Contains citation data extracted from Microsoft Academic Graph and other academic databases
DIMACS Shortest Path Challenge - Center for Discrete Mathematics and Theoretical Computer science
Update: Link no longer works
Graph data that models distances between many major U.S. towns and cities (23,947,347 nodes)
US Flight Data (1990 - 2009)
Description: 3D data stored as polygonal meshes
Found in many 3D file extensions that hold 3d data points: OBJ, DAE, FBX, etc
Programs that use and view polygonal model data that could be YURT usable:
Unity, Blender, Unreal, Paraview
Unlikely to be YURT usable:
Maya, Adobe Dimension
Data:
YURT Supernova (from Elaine Jiang)
World Vector Shorelines
http://www.earthmodels.org/data-and-tools/coastlines/World_Vector_Shorelines.vtp
Update: Link no longer works
Surface model of the world's shorelines
City of Adelaide 3D Model
Open City Model
Smithsonian Open Access
RISD Nature Lab models
RISD hosts a growing library of their nature lab models, which are available to see in person at the Nature Lab, or online in Sketchfab. They have a wide variety of uses for the digital collection and they keep track of those who reach out and provide feedback on their model. The collection is not complete, but they’re slowly scanning and uploading models for each item they have at the lab.
Description: Series of data points indexed by time.
Data:
CIT time series data
Human heart rate data - MIT
Global Forest Watch
Description: Data that captures various genetic information such as variances in genome sequences, DNA sequences, and other genetics information
Data:
Genome variant calls of chromosome 22 - Microsoft Research
1000 Genomes (full genomes with population, location metadata)
Microsoft Genomics Data Lake
Description: Output from cameras with proximity sensors contain an extra channel for depth. Its value ranges from 0-255 just like color. It is often visualized alone as grayscale.
Data:
https://sumochallenge.org has output from a 360º scan around a room
Description: Open standard format that outlines standards for using JSON file to represent geographical data.
Visualization Tools: VR-Viz
Data:
World Air Pollution Dataset
Outline of Brazilian States
OpenStreetMap
Map of Streets and Terrain Types of the entire world
Tools:
Refer to Postman Scraping with Ambee for more info on how to load data easily (may need to create new temp email account if calls to API exceed free version limits)
Global Forest Watch
Description: Global Forest Watch (GFW) is an online platform that provides data and tools for monitoring forests. By harnessing cutting-edge technology, GFW allows anyone to access near real-time information about where and how forests are changing around the world. Important founding partners are Google, USAID, Univeristy of Maryland, Esri.
Data Type:
Tree Cover lost
Tree Cover gain
Liza Kolev 2023
Description: A comma-separated values file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.
Data:
Renewable Internal Freshwater Resources Across World
Only has data for 2007, 2012, and 2014 (when looked at in 2023)
Percentage Population Using Safely Managed Water Services Across World
Liza Kolev 2023
Description: The shapefile format is a geospatial vector data format for geographic information system software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products.
Data:
World boundaries that you can download according to how specific and labeled you want it to be
Data will download in a zip file with different kinds of datasets. Shapefile ends in .shp.
Soccer/Football Data
Connor Flick, 03/10/2025
Description: Data measuring commute times, congestion, or incidents on road and commute networks. Data may be real-time (by connecting to an API) or aggregated, providing information over time. This data may integrate well with maps or Geospatial data and models.
Data:
Manhattan Congestion Pricing Data, commute times for selected routes
Certain endpoints (/routes.xlsx, /routes_janurary.xlsx, /routes_2024.xlsx) can be used to extract data for different timescales
Aarav Kumar, 03/10/2025
World Bank Data (1992 - 2014) - Renewable internal freshwater resources per capita (by country, 1997-2014)
Worldometers - Water Use by Country (yearly and daily water use by country in m^3)
FAO Aquastat Data - Water-use data by country and year
Better summarized by the World Population Review (total water withdrawal per capita by country in billion m^3, 2020)
OECD - Water Withdrawal Data (per capita by country in million m^3, 2021)
Colby Rees, 04/02/2025
eBird data on birds spotted at specific locations, during specific months
Create an account, then request access to the data. Include a brief description of how you plan to use the data. The request should be approved within a few days.
Movebank: Animal migration data
Data Zone by Bird Life: bird species and location data
Macaulay Library Cornell Lab: bird sounds
Description: MIDI (Musical Instrument Digital Interface) data allows musical instruments and other hardware to communicate with each other. MIDI holds information on notes and how they are played (for example note on, velocity, modulation, note duration, note off, etc.). In this form, MIDI attemps to digitally communicate the features of a musical performance.
Details of MIDI in VR, see this link
Details of MIDI standard and computer music, see this link
Aarav Kumar, 04/27/2025
USGS (United States Geological Survey) real-time and historical earthquake data: https://www.usgs.gov/programs/earthquake-hazards
Range of data for worldwide earthquakes with interactive maps)
Can be obtained in CSV and GeoJson forms, among others
CORGIS Dataset Project Earthquake Data: https://corgis-edu.github.io/corgis/csv/earthquakes/ (CSV data for earthquake magnitude, location, depth, significance -- but only till 2016)
Seismological Facility for the Advancement of Geoscience (SAGE), Wilber Tool: https://ds.iris.edu/wilber3/find_event
Gives global earthquake data, including seismographs and sound
Kaggle Earthquake Datasets
Earthquakes in Indonesia (Earthquake Repository, by BMKG): https://www.kaggle.com/datasets/kekavigi/earthquakes-in-indonesia
National Earthquake Information Center Data, from 1965 - 2016: https://www.kaggle.com/datasets/usgs/earthquake-database