(‘mind reading’ defined as reconstruction of visual perception)
Since 2017 there has been a new wave of visual reconstruction research (which can be seen as a form of ‘mind reading’), mostly driven by the popularity of powerful new generative models. Due to the obvious link to modern neural networks and machine learning these papers have also gained wide attention within the machine learning community.
This article is a short introduction to the nature of functional MRI data and to open data sets that are available to people who want to contribute to this field, but are new to neuroimaging research. Currently these data sets are spread all over the internet, hidden deep inside various repositories and lab websites, and it is unclear that data on which new ideas can be developed is easily accessible.
The nature of functional MRI data
There are various neuroimaging techniques, but functional MRI is the most spatially accurate technique we have for healthy human subjects (among non-invasive techniques, i.e. as long as you don’t stick electrodes into somebody’s brain). It measures the blood oxygenation level dependent (BOLD) signal in a 3D-localized piece of brain matter, which changes due to local energy consumption. You can measure it with MRI because deoxygenated blood has different magnetic properties than oxygenated blood. The BOLD signal seems to be interpretable as local brain activity, and you can make the same assumption when you use this data for machine learning.
Often authors will already have sub-selected the necessary voxels (the local activity feature, measured inside 3D pixels) for you, so every data point will just be an easy-to-handle long vector of local activity values. If you get a 3D (or for video data, 4D) matrix in an obscure neuroimaging format like NIfTI (.nii) or DICOM the authors provided you with the full functional MRI image. This is meant for sub-selecting either gray matter or interesting regions of interest (ROIs, e.g. various regions of the visual system) as much of that 3D box will not contain brain matter, and the necessary masks should have been provided together with the data. In case you got NIfTIs you can also get a blurry impression of the participant’s brain by loading the .nii file into viewers such as MRIcron.
fMRI is far from being an optimal measure of brain activity. As fMRI measures blood oxygenation properties that only occur after local energy expenditure, and as the peak of the signal is the influx of fresh oxygenated blood lagging behind seconds, you will only ever get an indirect measure. Next to low temporal accuracy it also suffers from several other sources of physiological noise (e.g. heart beat, location of big brain arteries, participant movements at a scale of millimeters, brain moving or being squished inside the skull…). Also, a typical voxel contains millions of actual neurons and glial cells and billions of synapses. However, as the brain and especially the visual system have complex and very localized blood supply you still get a lot of fine-grained activity detail. The signal-to-noise ratio seems to be sufficient for reconstructing what somebody sees after all.
The delay of the BOLD signal is described in the hemodynamic response function (HRF). The function you see below is the canonical hemodynamic response function (as available in various open source neuroimaging software modules):
It can be described as a combination of two gamma distributions, one modeling the peak (overshoot) and one the undershoot:
The peak of the BOLD signal (amplitude ) usually needs to be aligned with the stimulus occurring a few seconds before, which is done by temporal convolution with the HRF: (with being the unconvolved stimulus signal and the canonical HRF, at time ). The HRF can vary quite a bit depending on local or temporal physiological differences, so the canonical HRF model is considered inflexible. If feasible it is best to adjust (learn) it separately for every voxel. One paper where this has been done in a simple way is (Nishimoto 2011).
When you are trying to reconstruct static images these steps have usually already been done for you during preprocessing, and for reproducibility and as somebody unfamiliar with neuroimaging you may just want to stick with the preprocessing provided by the original authors. For video stimuli you have to take this delay into account yourself however.
Image stimulus data sets
In the following experiments participants saw static images while fixating on the center of the screen. In several data sets the test set was repeated and averaged while images from the training set were only presented once or twice. This resampling on the test set will provide a cleaner signal there, while the training set recording is aimed at more variance (but will usually be very noisy).
100 examples of MNIST handwritten digits 6 and 9 (80 in train, 20 in test, class-balanced), presented in a single-participant experiment.
van Gerven, M., de Lange, F. P., & Heskes, T. (2010). Neural decoding with hierarchical generative models. Neural Computation, 22, 3127–42. (direct link)
32 10-by-10 binary patterns of geometric shapes, alphabet characters and random patterns in a single subject, recorded across the visual system. (raw data on OpenNeuro)
Miyawaki, Y., Uchida, H., Yamashita, O., Sato, M., Morito, Y., Tanabe, H. C., Sadato, N., and Kamitani, Y. (2008). Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron, 60(5):915–929. (direct link)
360 class-balanced examples of 6 handwritten characters presented to two subjects (288 in train, 72 in test). Measured in primary visual cortex (V1) and V2.
Schoenmakers, S., Barth, M., Heskes, T., and van Gerven, M. A. J. (2013). Linear reconstruction of perceived images from human brain activity. NeuroImage, 83:951–961. (direct link)
Schoenmakers, S., Güçlü, U., van Gerven, M. A. J., and Heskes, T. (2015). Gaussian mixture models and semantic gating improve reconstructions from human brain activity. Frontiers in Computational Neuroscience, 8:173. (direct link)
These are fMRI responses of 2 subjects to 1750 masked training images (averaged over 2 repetitions) and 120 test images (averaged over 13 repetitions) from 2 subjects, measured in the larger visual system.
Kay, K. N., Naselaris, T., Prenger, R. J., and Gallant, J. L. (2008). Identifying natural images from human brain activity. Nature, 452(7185):352–355. (direct link)
Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., and Gallant, J. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6):902–915. (direct link)
5 subjects, 1200 training set images taken from 150 ImageNet categories, 50 test set images of different categories (not in the original ImageNet set). One cool thing about this data set is that it contains an imagery set. (raw data on OpenNeuro)
Horikawa, T. and Kamitani, Y. (2017). Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications, 8. (direct link)
A massive fMRI recording of 5254 images from ImageNet, Scene Understanding (SUN) and MS COCO in the brains of 4 subjects.
Chang, N., Pyles, J. A., Gupta, A., Tarr, M. J., & Aminoff, E. M. (2018). BOLD5000: A public fMRI dataset of 5000 images. arXiv preprint arXiv:1809.01281. (direct link)
Video stimulus data sets
Unsurprisingly the participants have been exposed to videos here. So-called spatiotemporal naturalistic stimulation seems to generate stronger brain responses as they are more engaging than flashing still images, but you have to find a way to handle the hemodynamic delay (as introduced above). Again SNR can be increased by repeating and averaging, which is usually done for test sets.
If you are interested in this branch of research you have certainly seen this video. The data for the underlying paper has been available for long time now, and will provide you with 7200 time points of training data and 540 time points of test data for 3 subjects, measured across the visual system. Note that the videos were shown at half speed in the scanner, which may facilitate reconstruction given the slow hemodynamic response.
Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., and Gallant, J. L. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19):1641–1646. (direct link)
This is a huge project where 20 participants watched the movie Forrest Gump with audio narration (written for visually impaired people) inside the MRI scanner. Their data was recorded at high spatial fMRI resolution in one of the world’s few 7T scanners. Viewing conditions were natural: The movie was shown in real time and participants were not required to fixate. Due to copyright you will have to build the stimulus material yourself, but the authors provide detailed instructions for replicating the stimuli.
Hanke, M., Baumgartner, F. J., Ibe, P., Kaule, F. R., Pollmann, S., Speck, O., Zinke, W. & Stadler, J. (2014). A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie. Scientific Data, 1, 140003. (direct link)