The tipping point: From animal intelligence to human intelligence

There are many obvious things that we humans do to a much larger degree than other animals. We construct great civilizations, we create advanced technology, we use complex language, we make art and tell stories. How do our unique capabilities guide us in figuring out how our brains are different from those of other animals, if they are?

To me, the most revealing feature of human intelligence is that it is primarily societal, rather than individual. Most of what each of us knows or understands is taught to us, rather than things we figured out. We have found a way to accumulate intelligence across individuals and across generations, and because of this, collective human intelligence has exploded over the past few thousand years. This accumulation is the basis of nearly all of our advances. Each human who pushes the envelope of human knowledge is first a prodigious student of the state of the art at the time.

So, what does the brain need to do to support this kind of capability, and what brain architecture might be employed to implement it? My guesses at the answers to these questions are described in an article posted on Arxiv entitled A Reservoir Model of Explicit Human Intelligence, and here is a brief summary.

Our first innovation was imagination. By this I mean the ability to perform mental processing on things that are hypothetical rather than the immediate physical present. Without imagination, the brain is restricted to being an input-output mapping machine. The development of imagination seems to me to be the hardest evolutionary step. To support off-line processing, we had to develop mechanisms to switch between a real-world mode, vigilant of our surroundings and reacting appropriately to them, and an off-line mode, where we are free to consider hypothetical scenarios, predict potential outcomes, to ponder. This required neural mechanisms in the brain, likely involving the default mode network, but also community and societal mechanisms to provide safety to those who are ‘daydreaming’. Some point to the stone tool industry as early evidence of imagination, starting around 1M years ago, but imagination was clearly solidified by the time we were making sophisticated art on cave walls about 80K years ago.

Enabled by imagination, the second innovation was language. Even with access to an off-line world model, without labels for things that are not present at the moment, we are limited in our communication to direct demonstration of objects and actions that we wish to convey, like a traveler with no knowledge of the local language. But with labels for both objects and actions, we can describe, record, and accumulate. Words also allow us to categorize, define, and produce higher levels of abstraction, as we do with mathematical theorems.

With imagination and language, I think that humans just expanded existing associative networks and mechanisms to develop what is now called explicit, reportable, or explainable intelligence, the stuff we accumulate and pass on. Lower animals can easily be taught to make associations between previously unrelated stimuli by simply juxtaposing them, like in the classic experiments performed by Pavlov on dogs. Using that same kind of network, we build a web of associations, organized by the curricular plan that our teachers, parents, and mentors define, and construct in our students a distillation of human knowledge. Excitation of elements of the network can initiate excitation that produces output actions, or run along recurrent paths representing internal thought. It’s a big web, anchored by the 20,000 or so words we learn, with hundreds of thousands more abstractions added in including all of our long term memories. Words serve as a random access addressing system to directly excite sequences of abstractions in our brains, and also influence others by exciting sequences in their brains as well. 

The previous billion years of evolution has done a slow but steady job of accumulating ever increasing intelligence in our genomes. But a tipping point occurred only a few thousand years ago, when intelligence began to be accumulated by the society itself, rather than by mutations in the genome. Accumulable intelligence requires that the knowledge be describable in a compact form for communication, so the intelligence must be stored in a form that is transparent, and a simple (though large) associative network may suffice. “Lower level” processes like visual processing are actually more complex, but do not need to be reportable in detail, and so have the luxury of utilizing deep networks with layers of hidden representations when they are discoverable by evolution.

I think that the two enabling developments for accumulable intelligence, capacities for imagination and language, were evolutionary innovations, probably driven by intelligence as a competitive advantage in changing natural environments. However, once this accumulation began, acceleration of collective intelligence became inevitable, despite the fact that the original evolutionary pressure largely evaporated when we mastered our environment.

The Challenge of BWAS: Unknown Unknowns in Feature Space and Variance

The paper by Marek et al (Reproducible brain-wide association studies require thousands of individuals, Nature, 602, 7902, pp 654-660, 2022) came out recently, and caused a bit of a stir in the field for a couple of reasons: First, the title, while an accurate description of the findings of the paper, is bold and lacking just enough qualifiers to quell immediate questions. “Does this imply that fMRI or other measures used in BWAS are lacking intrinsic sensitivity?” “Is this a general statement about all studies now and into the future?” “Is fMRI doomed to require thousands of individuals for all studies?” The answers to all these questions is “no,” as becomes clear on reading the paper.

Secondly, I think that the reaction of many on reading the title was a sigh and a thought that this is yet another paper in the same vein as the dead salmon study, the double dipping paper, or the cluster failure paper that makes a cautionary statement about fMRI that is then wildly spun by the popular media to imply more damning impact than brain imaging experts would gather. Again, it’s not this kind of paper, however there was a bit of hyperbole in places. The Nature News article titled “Can brain scans reveal behavior? Bombshell study says not yet” discusses this in an overall reasonable manner but the need for an attention-grabbing title was unfortunate. The study was not a bombshell. The Marek study was a clear, even-handed, well-done (clearly a huge amount of work!) description of a specific type of comparison in fMRI and MRI performed in a specific way. While my reaction to the Merek paper was that of mild surprise that the reported correlation values were a bit lower than expected, I was more curious than anything, and thankful that such a study was performed to clarify precisely where the field – again, for a specific type of study performed in a specific manner – was.

I was asked by several groups to comment on it. First, I discussed my thoughts with Nature News. At the time of my discussion, I was still not certain what I thought of the paper, and was suggesting that there may be sources of error and low power that might be improved upon: such as population selection, the choice of resting state as the measure, time series noise, or even spatial normalization pipelines that might be smearing out much of the useful information. I aimed to emphasize in that discussion that it should be made clear that the Marek paper is emphatically NOT a statement about the intrinsic sensitivity of fMRI – which sensitive enough to reliably detect activation in single subjects – and even in single runs or with single events. It was more a statement on the challenges of extracting subtle differences between populations having different behaviors. While I feel that there is quite a bit that can be done to push the necessary numbers down (as a field, we are really just getting started), I can’t rule out the fact that people may just be too different in how their brains manifest differences in behavior – thus confounding attempts to capture population effects. It’s really an interesting question for future study.

I was also asked to write something for an upcoming collection of opinions on the Marek paper to be published in Aperture Neuro – a new publishing platform associated with the Organization for Human Brain Mapping. I finally submitted it a few weeks ago.

In the mean time, four of the authors (Scott Marek, Brenden Tervo-Clemmens, Damian Fair, and Nico Dosenbach) graciously agreed to be interviewed by me on the OHBM Neurosalience Podcast. This episode can be reached here. During this truly outstanding conversation, the authors further clarified the methods and impact of the paper. I pushed them on all the things that could be improved, methodologically, to bring these numbers down but was just a bit further swayed that one implication of these results may be that the variability of people, as we currently sort them based on their behavior, really might be larger than we fully appreciate. It should be emphasized that the authors main message was overall extremely positive on the potential impact and importance of these large N studies as well as the many other ways that fMRI can be used with small N or even individual subjects to assess activity or changes in activity with interventions.

I was lastly asked to write a commentary for Cell Press’s new flagship medical and translational journal, Med, which I just submitted yesterday and am adding to this blog post, below. However, before you read that, I wanted to leave you with a thought experiment that might help illustrate the challenge – at least as I see it:

It’s been shown that fMRI can track changes in brain activity or connectivity with specific interventions. Let’s say, after a month of an intervention, we clearly see a change. This is not unreasonable and has been reported often. We repeat this for 100 or 1000 subjects. In each subject, we can track a change! Now, here’s the problem. If we repurposed this study as a BWAS study by grouping all subjects together before and then after the intervention and compare the groups, the implication (as I understand Marek et al) is that we would likely not see a reliable effect that comes through, and those effects that we did see from this BWAS-style approach would lack the richness of the individual changes that we are able to see longitudinally with every one of the subjects. The implication is that each subject’s brain changed in a way that was reliably measured with fMRI, but each brain changed in a way that was just different enough so that when grouped, the effects mostly disappeared. Again, this is just a hypothetical thought experiment. I would love to see such a study done as it would shed light on specifically what it is about BWAS studies that result in effect sizes that are lower than intuition suggests.

Either way, here is the paper that I just submitted to Med. I would like to thank my coauthors, Javier Gonzalez-Castillo, Dan Handwerker, Paul Taylor, Gang Chen, and Adam Thomas for all their insights and in helping to write it. On last note, since this paper was a commentary, I was limited to 3000 words and 15 references. Otherwise it would have been much longer with many more relevant references.


The challenge of BWAS: Unknown Unknowns in Feature Space and Variance

Peter A. Bandettini1,2, Javier Gonzalez-Castillo1, Dan Handwerker1, Paul Taylor3, Gang Chen3, Adam Thomas4

1 Section on Functional Imaging Methods

2 Functional MRI Core Facility

3 Scientific and Statistical Computing Core Facility 

4 Data Science and Sharing Team

National Institute of Mental Health

Bethesda, MD 20817

Abstract:

The recent paper by Marek et al. (Reproducible brain-wide association studies require thousands of individuals, Nature, 602, 7902, pp 654-660, 2022) has shown that to capture brain-behavioral phenotype associations using brain measures of cortical thickness, resting state connectivity, and task fMRI, thousands of individuals are required. For those outside the field of human brain mapping and even for some within, these results are potentially misunderstood to imply that MRI or fMRI lack sensitivity or specificity. This commentary expands and develops on what was touched upon in the Marek et al. paper and focuses a bit more fMRI. First it is argued that fMRI is exquisitely sensitive to brain activity and modulations in brain activity in individual subjects. Here, fMRI advancement over the years is described, including examples of sensitivity to robustly map activity and connectivity in individuals. Secondly, the potential underlying – yet still unknown – factors that may be determining for the need for thousands of subjects, as described in the Marek paper, are discussed. These factors may include variation in individuals’ anatomy or function that are not accounted for in the processing pipeline, sub-optimal choice of features in the data from which to differentiate individuals, or the sobering reality that the mapping between behavior (including behavior differences) and brain features, while readily tracked within individuals, may truly vary across individuals enough to confound and limit the power of group comparison approaches – even with fully optimized pipelines and feature extraction approaches. True human variability is a potentially rich area of future research – that of more fully understanding how individuals expressing similar behavior vary in anatomy and function. A final source of variance may be inaccurate grouping of populations to compare. Behavior is highly complex, and it is possible that alternative grouping schemes based on insights into brain-behavior relationships may stratify differences more readily. Alternatively, allowing self-sorting of data may inform dimensions of behavior that have not been fully appreciated. Potential ways forward to explore and correct for the unknown unknowns in feature space and unwanted variance are finally discussed.

The Emergence and Growth of fMRI:

Human behavior originates in the brain and differences in human behavior also have brain correlates. The daunting task of neuroscience is to trace differences and similarities in behavior over time scales of milliseconds to decades back to the brain which is organized across temporal and spatial scales of milliseconds to years and spatial scales of microns to centimeters. Capturing the salient features across these scales that determine behavior is perhaps the defining challenge of human neuroscience. Insights derived from this effort shape our understanding of brain organization and may provide clinical utility in diagnosis and treatment. Advances in this effort are fundamentally driven by more powerful tools coupled with more sophisticated questions, experiments, models, and analyses.

When functional MRI (fMRI) emerged, it was embraced because activation-induced signal changes are robust and repeatable. Blood oxygen level dependent (BOLD) contrast allows non-invasive mapping of neuronal activity changes in human brains with high consistency and fidelity on the scales of seconds and millimeters. Because it was able to be implemented on the already vast number of clinical MRI scanners in the world, its growth was explosive. The activation-induced hemodynamic response, while limited in many ways, has become a widely used and effective tool for indirectly mapping brain human activation. It is indirect because it relies on the spatially localized and consistent relationship between brain activation and hemodynamic changes that result in an increase in flow, volume, and oxygenation. Increases in flow are measured with techniques such as arterial spin labeling (ASL), volume with techniques such as vascular space occupancy imaging (VASO), and blood oxygenation with T2* or T2 weighted contrast (i.e. BOLD contrast). BOLD contrast is far and away the most common of the techniques because of its ease in implementation and highest functional contrast of the three.

Early on, richly featured and high-fidelity motor and sensory activation maps were produced, followed quickly by maps of cognitive processes and more subtle activation. Then resting state fMRI emerged in the late 1990’s, demonstrating that temporally correlated spontaneous fluctuations in the BOLD signal organized themselves into coarse networks across 100’s of nodes. The study of the functional significance of these networks rapidly followed, accompanied by revelations that these networks dynamically reconfigured over time, and were modulated in association with specific tasks, brain states, or measures of performance(1). 

Functional MRI has flourished over three decades in a large part because of its success in creating detailed and informative maps of brain activation in individuals in single scanning sessions. At typical resolutions, the functional contrast to noise of fMRI is about 5/1, depending on many factors. This robustness has enabled fMRI to delineate, at the individual level, activity changes associated with vanishingly subtle variations in stimuli or task, learning, attention, and adaptation to name a few.  Additionally, in quasi-real time, fMRI has successfully provided neuro-feedback to individuals, leading to changes in connectivity and, in some cases, behavior(2). Clinically,  fMRI is increasingly used for presurgical mapping of individuals(3). There is no doubt that the method itself is sufficiently robust and sensitive to be applied to individual subjects to map detailed organization patterns as well as subtle changes with interventions. 

Functional MRI has been taken further. Voxel-wise patterns of activity within regions of activation in individuals were shown to delineate subtle variations in task or stimuli. This pattern-effect mapping, known as representational similarity analysis(4), has shown continued success and growth. Because each pattern is subject and even session-specific, it currently defies multi-subject averaging; however, approaches such as hyper-alignment(5) show promise even at this level of detail. 

Over time, fMRI signal has been shown to be stable, repeatable, and sensitive enough to reveal induced differences in activity as an individual brain learns, adapts, and engages. Functional MRI can consistently delineate functional activation in individual brains – going so far as to be able to allow approximate reconstruction of the original stimuli, from activation patterns associated with movie viewing or sentence reading(6,7). All these approaches rely on within-individual contrasts, thus sidestepping the less tractable problem of variance across subjects. 

For “central tendency” mapping, it was determined that combining data across subjects shows the generalizability from individuals to a population. The “central tendency” effects and derived time courses are more stable but inevitably minimize or remove more subtle effects that population subsets may reveal. These approaches are negatively impacted by variation in structure and function that may be unaccounted for or defy current best practices in spatial normalization and alignment. 

Over the past three decades, since fMRI and structural MRI have been able to provide individualized information, the desire has been to go beyond central tendency mapping to reveal individual differences in activation, connectivity, and function. With “standard” clinical MRI, scans of the brain, lesions, tumors, vascular, or gross structural abnormalities have been straightforward for a trained radiologist to identify; however, psychiatric and most behavioral differences have brain correlates that are much too subtle for standard clinical MRI approaches. An effort has been made over at least the past two decades to pool and average functional and/or structural images together towards the creation of reproducible and clinically useful biomarkers. No one doubts that differences between individuals or truly homogeneous groups reside in the brain; however, whether they can be seen robustly or at all at the specific temporal and spatial niche offered by structural and functional MRI remains an open question. This question remains open because the brain is organized across a wide range of temporal and spatial scales and the causal physical mechanisms that lead to trait or state differences are not currently understood. At this stage, neuroscientists and clinicians are using fMRI to determine if any signatures related to behavioral or state differences can be robustly seen at all. It may well be that distinct brain differences across many scales can lead to similar trait differences or it may be that they reside at a spatial or temporal scale – or even magnitude – that is outside of what fMRI or MRI can capture. It remains to be fully determined.

The challenge of the Marek paper:

The recent paper by Marek et al. (8) has argued that behavioral phenotype variations associated with variations in cortical thickness, activation, and resting state connectivity, which they termed Brain-Wide Associations (BWAS) as measured with MRI, are reproducible only after thousands of individuals are considered. The authors of the paper suggest that the unfortunate reality is that the effect sizes are so small that reproducible studies require about two thousand subjects, and would benefit somewhat from further reduction in time series noise and multivariate analysis approaches. It is good news that we can get an effect, but for many invested in fMRI studies with this goal, this may be cause for despair and confusion. How is it that we can map individual brains so robustly, efficiently, and precisely, yet require so many subjects to derive any meaningful result when looking for differences in this readily mapped functional and structural information? 

While single subjects can produce robust activation and connectivity maps, the differences in activation or structure as they relate to differences in traits across individuals are either so subtle and/or so variable that thousands of subjects are required to see emerging (i.e., “central tendency”) effects – and these may be just the most robust effects. Put another way, if the unwanted variability across subjects were vanishingly small, then the results of Marek et al would suggest that the BWAS – related differences in measured activation, structure, or connectivity would be about three orders of magnitude smaller than the main effect that is commonly seen in individual maps (1 subject required for an activation map vs 1000 subjects required for reliable difference). Given the much more readily observed changes observed while tracking individuals longitudinally as they change state, the small difference explanation seems highly unlikely. Therefore, the need for thousands of subjects is more likely explained predominantly by the unwanted and unaccounted for variance in trait-relevant or processing pipeline-related structural, activation, or connectivity patterns. 

The problem or challenge, as it exists, is not primarily with the sensitivity or specificity of fMRI or structural MRI. Rather it likely resides in the uncharacterized and tremendously large variation in observed brain-behavior relationships across individuals. The underlying brain structure-function relationships, as measured with fMRI or MRI, that may be different for, say, a depressed individual may be numerous, subtle, and idiosyncratic. The study of BWAS is an attempt to determine the most common brain-based causes from a turbulent sea of possible causes across individuals. The Marek et al study has shown that this challenge is more profound than most of us may have imagined – at least on the temporal and spatial scale that we have access to through our tools. It may also be true that those effects that we do eventually see after studying groups of thousands of subjects are but a small fraction of the dispersed effects unique to each individual – and that those that we are able to observe are not necessarily the most influential to the trait observed, as they are simply, by definition, the most commonly observed. 

Marek et al have done a service to the field by pointing out concerns for a type of fMRI study that has wide-spread interest but so far, relatively few reported studies. Their work may be interpreted to suggest that, given the formidable number of subjects needed, BWAS-style studies are not a practically tenable use of fMRI. This conclusion should be tempered by an alternate view. Large databases of deeply characterized subjects may be queried in many different ways, potentially increasing their utility into the future. The authors also point out that the effect sizes shown are at least comparable to large database gene-wide association studies (GWAS). Improvement is still likely. It’s important to make sure the field of fMRI has done due diligence in being certain that it has minimized the irrelevant variance across subjects as it is manifest through our techniques for determining function and in our techniques for pooling multi-subject data. 

The unknown unknowns in feature space and variance:

Is there something we are missing – hidden sources of irrelevant variance, inaccurate choices in feature space, or mischaracterization and therefore mis-grouping of behavioral phenotypes – that are suppressing the more informative features and thus reducing effect size? In the tables below, the “unknown unknowns” in understanding BWAS power and possible approaches to address them are described. Table 1 lists potential unknown confounders that may be reducing BWAS power. Within this table are some considerations on how to understand and address these unknowns. Much more could be said for any of these, and indeed work is already taking place worldwide on all these topics. Table 2 lists other considerations that are not necessarily unknowns but areas of active research that should also be considered when designing BWAS or perhaps any fMRI study.

Table 1: Potential Confounders that are not fully understood nor addressed:

ConfoundDescription
Resting State fMRIWhat really is resting state fMRI – aperiodic bursts of synchronized activity? How much is conscious? How much is arousal? How much is breathing? How does it vary with brain state, prior tasks, time of day, etc.? How deeply can we truly interpret correlated time series signals as the correlation depends on signal phase, shape, and underlying noise – all that could change, implying a change in connectivity where there is none, or vice versa. As easy as it is to implement resting state in the scanner, without more precise ways of dissecting and interpreting the most informative aspects of this signal, other approaches might be more powerful. At the very least, external measures that help inform the analysis of resting state (e.g., eye tracking or alertness measures) are needed.  
Spatial NormalizationIndividual brain anatomy varies as a function of spatial scale. Transforming brains to normalized and standardized space may be removing informative features. Nonlinear warping and registration approaches have advanced over the years yet remain far from perfect. One source of imperfection is anatomical:  when aligning brains with strongly varying sulcal and gyral patterns, diffeomorphic warp fields have errors in some areas. On a coarser scale, brains have regionally differing gyral and sulcal patterns as well as different functional/structural relationships. Echo planar images have additional warping due to field inhomogeneities.  
ParcellationIf a standard parcellation template is applied to a cohort of normalized brains, the mismatch between the true functional delineation of each parcel in each subject’s brain and the applied parcellation may be profound, causing extreme mixing of the signal between adjacent parcels. It may also result in misidentified parcels: a subject’s region X is, in reality, mostly in region Y, so it gets binned and compared with wrong information, either washing out real effects or pointing to false ones. Effects from small parcels may be entirely washed out. Additionally, it’s likely that the typical parcels are substantially larger than most informative cortical units. A difference between may reside as a connection difference between a small sub-component at the border of one parcel, which may be mixing with the signal from other parcels, thus eliminating the effect. Such a useful feature, if it existed, would be invisible in the analysis described in Marek et al. The variation between functionally derived individual subject parcellation maps should be further explored. Misalignment, misregistration, and mis-parcellation may be substantial sources of unwanted variance.  
Processing PipelinesThe Marek paper had well-controlled pipelines, however, each pipeline has many steps, well beyond the scope of this perspective piece, that, if varied, would result in perhaps different conclusions. Pipeline comparisons have shown the sensitivity to processing steps for the results produced, however, missing is the lack of “ground truth.” Every pipeline likely has shortcomings. Quality control metrics for each time series, combined with visual inspection of the data in an efficient manner is fundamental for the development of more automated methods for identifying and reducing variance in population-level studies.    
Population SortingPsychosis and intelligence, used here to sort the populations being compared, are likely oversimplifications of highly multidimensional behavioral phenotypes that may have no one correspondence in the brain. If they are all pooled together for comparison, interesting and perhaps strong differences may be washed out. More precise and nuanced pooling of populations or even data driven population sorting (while carefully avoiding circularity of course) would perhaps improve these results significantly. Behavioral phenotypes and brain measures are high dimensional. As these manifolds are better understood, it’s likely that stronger associations will be obtained with greater efficiency.   
Anna Karenina effectThis effect was first suggested by Finn et al (9)and based on the first line of the famous novel by Tolstoy: “Happy families are all alike; every unhappy family is unhappy in its own way.” It may be that the neuronal correlates of disorders are substantially more variable than the central tendencies of normal populations, reducing the effect size when attempting to discern a single network or set of networks associated with the disorder. This effect may play a role in the distributions of phenotypes even within typical non-pathologic ranges – such as intelligence.

Table 2: Other Avenues to Improvement

ApproachDescription
Dynamic resting state fMRIWhat really is resting state fMRI? Is it aperiodic bursts of synchronized activity that is transformed through the hemodynamic response to low frequency fluctuations? How much arises from conscious experience(10)? How much is arousal? How much is breathing? How does it vary with brain state, prior tasks, time of day, etc.? How deeply can we (or should we) interpret correlated time series signals as the correlation depends on signal phase, shape, and underlying noise – all that could change, implying a change in connectivity where there is none, or vice versa. As easy as it is to implement resting state in the scanner, without more precise ways of dissecting and interpreting the most informative aspects of this signal, other approaches might be more powerful. At the very least, external measures that help inform the analysis of resting state (e.g., eye tracking or alertness measures) are needed.  
Naturalistic StimuliEngaging subjects in passive or minimally demanding yet time-locked tasks has been shown to produce more stable connectivity maps and opens up new options for analyses. For instance, movie watching or story listening allows model driven or cross-subject correlation analysis and helps to tease apart informative elements of ongoing brain activity(11,12). Time locked continuous engagement in a task also may be optimized to differentiate behavioral phenotypes – used as “stress tests” in similar ways as cardiac stress tests are used to identify latent pathology. Continuously engaging tasks also control for vigilance changes over time – which has been shown to be a confound.   
Task fMRILike movies, as mentioned above, a well-chosen set of tasks may serve to better stratify effects across individuals and populations. Specific tasks could be optimized to produce a large range of fMRI responses depending on the question and associated behavioral measures. The field of fMRI has evolved a massive array of tasks, able to selectively activate a wide range of networks. With more precise control over activation magnitude and location, as well as precise monitoring of task performance with each response, selective dissection of differences might improve.   
Spatial ResolutionDifferences may perhaps reveal themselves more clearly at the layer or column resolution level – capable of being imaged with fMRI, however here, the problem of spatial normalization and registration becomes even more problematic and unsolved by any automated process. For example, to illustrate, an early fMRI paper demonstrated clear differences in ocular dominance column distribution in patients with amblyopia. If these data were put through the pipelines used in the Marek paper, the results would likely fall well below any statistical threshold or measure of replicability as the useful features are much finer than the spatial error inherent to spatial normalization – not to mention that ocular dominance columns are quasi-random, thus defying any current normalization scheme. We need to improve our ability to identify and use, in a principled manner, features such as these before we can make conclusive statements on effect size that is derivable with fMRI.  
Time Series VarianceIn these data physiological noise dominates over more well-understood thermal noise. Methods for reducing time series variance were mentioned in Marek et al. Novel acquisition approaches such as multi-echo fMRI may help, along with external measures of breathing, vigilance, and other contributors to variance. Even with these methods for measurement, robust ways of using these measures to eliminate this variance – or perhaps associate it with phenotype – requires substantial further development. It should be emphasized here that if the field is fully successful in eliminating all physiological noise from the data, then rather than having a ceiling temporal signal to noise ratio (SNR) of 100/1, the temporal SNR would only be limited by the intrinsic image SNR determined by the scanning parameters and the RF coil – thus allowing perhaps an order of magnitude improvement in temporal signal to noise. 
Other fMRI and MRI FeaturesCorrelation is but one feature of the fMRI time series. Other features such as entropy, network configuration dwell time, the sequence of network configurations over time, mutual information, and even standard deviation, may prove to be more robust and informative. The activation-elicited fMRI signal itself can be further reduced to other features such as latencies, undershoots, transients, NMR phase, and much more. Perhaps all of these contain independent information that may be leveraged in multivariate analysis to increase power. Structural features such as gyrification, fractal dimension, global T1, T2, etc… may also be more informative than gray matter thickness.

In summary, Marek et al provide a sobering snapshot of the state of BWAS using MRI and fMRI. The study of brain wide associations(13), like the study of gene-wide associations(14), does have promise however has barely just begun work towards objectively identifying and extracting the most meaningful features and identifying and removing the confounding variance from the signal – in time and space. We are at an early stage in this promising research. The Marek at al study has performed a profound service by clarifying, quantifying, and highlighting the challenge. 

The study of individuals and how they change with time and natural disease progression, or interventions will continue. In fact, large population longitudinal studies in which each participant is directly compared with themselves at an earlier time, and then compared across the cohort will likely have a high yield of deep insights into brain differences and similarities(15). These studies are difficult but are worth pursuing as they avoid many of the potential pitfalls of BWAS, related to between-subject variability, as described in Marek et al.

Individual or small N fMRI will continue as insights into healthy brain organization and function are still being derived at an increasingly rapid rate as the field develops methods to extract more subtle information from the data. Individual fMRI for presurgical mapping, real time feedback, and neuromodulation guidance also continues with extremely promising progress. 

Evolving fMRI from central tendency mapping to identifying differences in individuals has proven to be deeply challenging. As the field continues working to address this challenge, it will likely uncover unique sources of variance residing in every step of acquisition and analysis; as well as yet-uncovered structure in idiosyncratic brain-behavior relationships. The fMRI signal is intrinsically strong, reproducible, and robust, as has been shown over the past 30 years. To use it to compare individuals, we need to delve much more deeply into how individuals and their brains vary so we can identify and minimize the still unknown nuisance variance and maximally use the still unknown informative variance. Once we can do this, the effect sizes and replicability promise to reach a useful level with fewer required subjects. In the process of this work, new principles of brain organization may likely be derived. Perhaps before the field rushes ahead to collect more two-thousand subject cohorts, it should explore, understand, and minimize the unknown unknowns in the feature space and variance among individuals.

References

1.         Newbold DJ, Laumann TO, Hoyt CR, Hampton JM, Montez DF, Raut RV, et al. Plasticity and Spontaneous Activity Pulses in Disused Human Brain Circuits. Neuron. 2020;1–10.

2.         Ramot M, Kimmich S, Gonzalez-Castillo J, Roopchansingh V, Popal H, White E, et al. Direct modulation of aberrant brain network connectivity through real-time NeuroFeedback. Elife. 2017;6:e28974.

3.         Silva MA, See AP, Essayed WI, Golby AJ, Tie Y. Challenges and techniques for presurgical brain mapping with functional MRI. NeuroImage Clin. 2018 Jan 1;17:794–803.

4.         Kriegeskorte N, Mur M, Bandettini P. Representational similarity analysis – connecting the branches of systems neuroscience. Front Syst Neurosci. 2008 Nov;2(NOV):2007–8.

5.         Haxby JV, Guntupalli JS, Nastase SA, Feilong M. Hyperalignment: Modeling shared information encoded in idiosyncratic cortical topographies. Elife. 2020;9:e56601.

6.         Pereira F, Lou B, Pritchett B, Ritter S, Gershman SJ, Kanwisher N, et al. Toward a universal decoder of linguistic meaning from brain activation. Nat Commun. 2018 Mar 6;9(1):963.

7.         Nishimoto S, Vu AT, Naselaris T, Benjamini Y, Yu B, Gallant JL. Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies. Curr Biol. 2011 Oct 11;21(19):1641–6.

8.         Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022 Mar;603(7902):654–60.

9.         Finn ES, Glerean E, Khojandi AY, Nielson D, Molfese PJ, Handwerker DA, et al. Idiosynchrony: From shared responses to individual differences during naturalistic neuroimaging. NeuroImage. 2020 Jul;215:116828–116828.

10.       Gonzalez-Castillo J, Kam JWY, Hoy CW, Bandettini PA. How to Interpret Resting-State fMRI: Ask Your Participants. J Neurosci. 2021 Feb 10;41(6):1130–41.

11.       Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R. Intersubject Synchronization of Cortical Activity during Natural Vision. Science. 2004 Mar;303(5664):1634–40.

12.       Finn ES. Is it time to put rest to rest? Trends Cogn Sci. 2021 Dec 1;25(12):1021–32.

13.       Sui J, Jiang R, Bustillo J, Calhoun V. Neuroimaging-based Individualized Prediction of Cognition and Behavior for Mental Disorders and Health: Methods and Promises. Biol Psychiatry. 2020 Dec 1;88(11):818–28.

14.       Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017 Jul 6;101(1):5–22.

15.       Douaud G, Lee S, Alfaro-Almagro F, Arthofer C, Wang C, McCarthy P, et al. SARS-CoV-2 is associated with changes in brain structure in UK Biobank. Nature. 2022 Apr;604(7907):697–707.