Scientific Data
For this semester, we'll be focusing on finding and evaluating software for exploratory scientific visualization. The data types below can be used in your project and tutorials. Please add more data types and examples as you find them. Be sure to document any data you add to this page in your journal.
For an abbreviated list of common data types, a brief definition of each type, and recommended/feasible software to use with each type, please see the Software Recommendations by Data Type page.
Please note: many of the following files are very large (we're talking up to around a hundred MB for some of these)! Additionally, this page is a continual work in progress. The categories of data are not exhaustive and are listed in no partincular order, so be prepared for the page to be somewhat messy.
2D Image Data
Description: 2D array of data values that can be reduced to an RGB image before viewing or interactively during viewing.
Data:
Simple Animals (from Felice Frankel at MIT)
Cell Image Library - 2D images of many different types of cells
Electron Microscopy Public Image Archive (EMPIAR)
Lots of 2D microscopy data
3D Vector Field or Flow Data
Description: A magnitude and direction at each more in 3D space. Can be 3D flow. Can be time-varying.
Data:
Vibrating Cylinder Flow - George Karniadakis and Zhicheng Wang
Contains 10 .plt files each of which contains 12 components of data:
(x, y, z) position coordinates
(u, v, w) velocity components in the x, y, z directions
p is pressure
vorticity_x, vorticity_y, vorticity_z
Q, second invariant of the velocity gradient
Building Downwash Simulations
Turbulence Decay
Point Cloud Data
Description: data that represents a collection of points in 2D or 3D space.
Data:
las files: Lidar data comes comes in many forms, however, las files are one of the most popular Lidar data formats. An las file is a binary file which contains point cloud data, stored as X, Y, Z coordinates, and a header which contains file metadata, RGB information, GIS information, and many other optional fields. For more detailed information on las files, see the official las specification.
Ecology data from EEB Professor Jim Kellner and postdoc Loren Albert:
Sample NEONDS Sample LiDAR Point Cloud Data:
Sample Velodyne .pcap files:
City of Montreal, Canada LiDAR
http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015 (note that this page is written in French)
Point Cloud Visualization Software:
Point Cloud Processing Software:
Comparisons:
Tutorials:
Tutorial on Python-PCL
Tutorial on Converting .las files to .out files: Laspy Tutorial
Tutorial on Lidar Paraview to Blender
Tutorial on viewing .pcap files in Veloview.
Papers:
Summaries of papers on processing and rendering LiDAR data.
(T) signifies whether a particular software package has been tested; ($) signifies license required.
Planetary Geology Imaging Data
Description: This might be an amalgam of different data types, including collections of stars, collections of galaxies, outward-looking imagery (as from a telescope), or inward looking imagery (as from a satellite of a planet). Some such data is 2D imaging data, but with a particular underlying space, e.g., Earth, the moon, or Mars.
Data:
NASA Global Imagery Browser Services (GIBS):
GIBS is a database of satellite imagery data collected by NASA JPL; the database supports an extensive REST API, however, it can be somewhat complicated to use.
GIBS returns satellite data in the form of tiles or maps (in Mercator projection)
Earthdata is a massive database of atmospheric, land, and ocean data operated by NASA
Software:
OpenSpace software views some kinds of data of these types, and it has the potential to generalize to more. It almost runs in the Yurt and does run on desktops and HMD's.
MRI Imaging Data
Description: MRI imaging data comes in many forms, often depending on the type of MRI machine utilized when scanning a patient. Two of the most common MRI data formats are NIfTi and DICOM. Luckily, most MRI processing software can easily convert between these two formats, so we will focus on visualizing and processing only NIfTi files. For an in-depth comparison between NIfTi and DICOM, see Medical Imaging Formats.
Data:
Brain Tumor Segmentation Challenge (BRATS) - BRATS is a comprehensive MRI brain tumor data set comprised of 243 MRI scans, expertly labeled based on tumor pathology.
Alzheimer's Disease Neuroimaging Initiative (ADNI) - ADNI contains a collection of over 3000 MRI scans of Alzheimer's patients, captured over several years and with varying MRI scanners.
Note that both the BRATS and ADNI data sets are not public and must be requested.
OpenNeuro is a free and open platform allows users to share data, get data, and use data that is BIDS-compliant MRI, PET, MEG, EEG, and iEEG data. The BIDS stands for the Brain Imaging Data Structure, the emerging standard for the organization of neuroimaging data.
Software:
Papers:
Human brain functional MRI and DTI visualization with virtual reality: an in-depth overview of VR visualization of MRI data.
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS): an overview of the BRATS data set.
Medical Imaging Formats: a comprehensive comparison of the most popular MRI data formats.
Similar to:
CLARITY Brain Imaging
CT Brain imaging
Network Data
Data:
Twitter API
Probably the easiest social media API to use; clearly defined data types and properties (just Tweets associated with users and their info)
Access Tweets in realtime; filter by many different properties
Access previous Tweets for other types of analysis
Apply for Twitter Developer tools here
[2023 Feb. Update Regarding Twitter API]
Starting February 9, Twitter ended its free access offering. With Twitter API ver.2, we will get 3 tiers (one coming soon)
The price for these tiers are TBD according to Twitter's official documentation as of 2023.2.24
Essential Plan will offer
1 project
1 app per project
retrieving up to 500k Tweets per month
Access to Twitter API v2 and standard v1.1
No access to Twitter API premium v1.1 and enterprise
Elevated Plan will offer
1 project
3 apps per project
retrieving up to 3 million Tweets per month
Access to API standard 1.1, premium v1.1, enterprise
Academic Research
1 project
1 App per project
retrieve up to 10 million Tweets per month
Access to API standard 1.1, premium v1.1, enterprise
Details on these access levels can be found here
2012 US election tweets - Microsoft Research
Daily geolocated tweets - Microsoft Research
Citation network dataset
Contains citation data extracted from Microsoft Academic Graph and other academic databases
DIMACS Shortest Path Challenge - Center for Discrete Mathematics and Theoretical Computer science
http://users.diag.uniroma1.it/challenge9/data/USA-road-d/USA-road-d.USA.gr.gz
Graph data that models distances between many major U.S. towns and cities (23,947,347 nodes)
US Flight Data (1990 - 2009)
Polygonal Model Data
Description: 3D data stored as polygonal meshes
Found in many 3D file extensions that hold 3d data points: OBJ, DAE, FBX, etc
Programs that use and view polygonal model data that could be YURT usable:
Unity, Blender, Unreal, Paraview
Unlikely to be YURT usable:
Maya, Adobe Dimension
Data:
YURT Supernova (from Elaine Jiang)
World Vector Shorelines
http://www.earthmodels.org/data-and-tools/coastlines/World_Vector_Shorelines.vtp
Surface model of the world's shorelines
City of Adelaide 3D Model
Open City Model
Time Series Data
Description: Series of data points indexed by time.
Data:
CIT time series data
Genomics Data
Description: Data that captures various genetic information such as variances in genome sequences, DNA sequences, and other genetics information
Data:
Genome variant calls of chromosome 22 - Microsoft Research
RGB-D
Description: Output from cameras with proximity sensors contain an extra channel for depth. Its value ranges from 0-255 just like color. It is often visualized alone as grayscale.
Data:
https://sumochallenge.org has output from a 360º scan around a room
GeoJSON Data
Description: Open standard format that outlines standards for using JSON file to represent geographical data.
Visualization Tools: VR-Viz
Data:
World Air Pollution Dataset
Outline of Brazilian States
OpenStreetMap
Map of Streets and Terrain Types of the entire world
Tools:
World Cities Environmental Data
Refer to Postman Scraping with Ambee for more info on how to load data easily (may need to create new temp email account if calls to API exceed free version limits)
CSV Data
Liza Kolev 2023
Description: A comma-separated values file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.
Data:
Renewable Internal Freshwater Resources Across World
Only has data for 2007, 2012, and 2014 (when looked at in 2023)
Percentage Population Using Safely Managed Water Services Across World
Shapefile Data
Liza Kolev 2023
Description: The shapefile format is a geospatial vector data format for geographic information system software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products.
Data:
World boundaries that you can download according to how specific and labeled you want it to be
Data will download in a zip file with different kinds of datasets. Shapefile ends in .shp.
Soccer Data