Citation: | DONG Chenjie, WANG Zhengye, LI Gantang, WANG Suiping. The Neurocognitive Basis of Multisensory Processing[J]. Journal of South China normal University (Social Science Edition), 2025, (1): 90-105. |
Multisensory processing is a fundamental cognitive function that underlies perception, attention, memory, language, and learning. Understanding the neurocognitive mechanisms of this function is of great theoretical and practical significance for understanding the principles of the human mind and for guiding the development of multisensory artificial intelligence. However, the mechanisms of multisensory information processing are extremely complex, and systematic review is essential for accurately understanding the cutting-edge progress in this field and inspiring future research. In this review paper, we first introduced the functional characteristics and response principles of multisensory neurons in the brain. Then, we discussed the mechanisms of cross-modal modulation between primary cortical areas. After that, we focused on the functions and computational mechanisms of multisensory areas and hierarchical multisensory brain networks. Lately, we discussed how studying human multisensory processing can inform the development of multisensory artificial intelligence.
[1] |
JAMES W. The principles of psychology[M]. New York: Henry Holt and Company, 1890: 15-27.
|
[2] |
SPENCE C. Multisensory perception[J]. Stevens' handbook of experimental psychology and cognitive neuroscience, 2018, 2: 1-56.
|
[3] |
BARTH F G. Sensory perception: adaptation to lifestyle and habitat[M]//Sensory perception: mind and matter. Vienna: Springer Vienna, 2012: 89-107.
|
[4] |
COLAVITA F B. Human sensory dominance[J]. Perception and psychophysics, 1974, 16(2): 409-412. doi: 10.3758/BF03203962
|
[5] |
ERNST M O, BÜLTHOFF H H. Merging the senses into a robust percept[J]. Trends in cognitive sciences, 2004, 8(4): 162-169. doi: 10.1016/j.tics.2004.02.002
|
[6] |
文小辉, 李国强, 刘强. 视听整合加工及其神经机制[J]. 心理科学进展, 2011, 19(7): 976-982.
|
[7] |
SHAMS L, BEIERHOLM U. Bayesian causal inference: a unifying neuroscience theory[J]. Neuroscience and biobehavioral reviews, 2022, 137: 104619. doi: 10.1016/j.neubiorev.2022.104619
|
[8] |
SHAMS L, KIM R. Bayesian priors and multisensory integration at multiple levels of visual processing: reply to comments on "crossmodal influences on visual perception"[J]. Physics of life reviews, 2010, 7(3): 295-298. doi: 10.1016/j.plrev.2010.07.006
|
[9] |
SHAMS L, BEIERHOLM U R. Humans'multisensory perception, from integration to segregation, follows bayesian inference: sensory cue integration[M]. Oxford: Oxford University Press, 2011: 251-262.
|
[10] |
康冠兰, 罗霄骁. 视听跨通道信息的整合与冲突控制[J]. 心理科学, 2020 (5): 1072-1078.
|
[11] |
ANGELAKI D E, GU Y, DEANGELIS G C. Multisensory integration: psychophysics, neurophysiology, and computation[J]. Current opinion in neurobiology, 2009, 19(4): 452-458. doi: 10.1016/j.conb.2009.06.008
|
[12] |
WELCH R B, WARREN D H. Immediate perceptual response to intersensory discrepancy[J]. Psychological bulletin, 1980, 88(3): 638-667. doi: 10.1037/0033-2909.88.3.638
|
[13] |
MURRAY M M, LEWKOWICZ D J, AMEDI A, et al. Multisensory processes: a balancing act across the lifespan[J]. Trends in neurosciences, 2016, 39(8): 567-579. doi: 10.1016/j.tins.2016.05.003
|
[14] |
HOLMES N P, SPENCE C. Multisensory integration: space, time and superadditivity[J]. Currentbiology, 2005, 15(18): R762-R764. http://www.ncbi.nlm.nih.gov/pubmed/15988597
|
[15] |
VROOMEN J, KEETELS M. Perception of intersensory synchrony: a tutorial review[J]. Attention, perception, and psychophysics, 2010, 72(4): 871-884. doi: 10.3758/APP.72.4.871
|
[16] |
MURRAY M M, THELEN A, THUT G, et al. The multisensory function of the human primary visual cortex[J]. Neuropsychologia, 2016, 83: 161-169. doi: 10.1016/j.neuropsychologia.2015.08.011
|
[17] |
SPENCE C, SATHIAN K. Audiovisual crossmodal correspondences: behavioral consequences and neural underpinnings[M]//Multisensory perception: from laboratory to clinic. Amsterdam: Elsevier/Academic Press, 2020: 239-258.
|
[18] |
CHEN Y C, SPENCE C. Crossmodal semantic priming by naturalistic sounds and spoken words enhances visual sensitivity[J]. Journal of experimental psychology: human perception and performance, 2011, 37(5): 1554-1568. doi: 10.1037/a0024329
|
[19] |
TALSMA D, SENKOWSKI D, SOTO-FARACO S, et al. The multifaceted interplay between attention and multisensory integration[J]. Trends in cognitive sciences, 2010, 14(9): 400-410. doi: 10.1016/j.tics.2010.06.008
|
[20] |
DEROY O, SPENCE C, NOPPENEY U. Metacognition in multisensory perception[J]. Trends in cognitive sciences, 2016, 20(10): 736-747. doi: 10.1016/j.tics.2016.08.006
|
[21] |
GAU R, NOPPENEY U. How prior expectations shape multisensory perception[J]. Neuroimage, 2016, 124: 876-886. doi: 10.1016/j.neuroimage.2015.09.045
|
[22] |
CHEN Y C, SPENCE C. Assessing the role of the 'unity assumption' on multisensory integration: a review[J]. Frontiers in psychology, 2017, 8: 445. doi: 10.3389/fpsyg.2017.00445/pdf
|
[23] |
CANON L K. Intermodality inconsistency of input and directed attention as determinants of the nature of adaptation[J]. Journal of experimental psychology, 1970, 84(1): 141-147. doi: 10.1037/h0028925
|
[24] |
BIZLEY J K, MADDOX R K, LEE A K C. Defining auditory-visual objects: behavioral tests and physiological mechanisms[J]. Trends in neurosciences, 2016, 39(2): 74-85. doi: 10.1016/j.tins.2015.12.007
|
[25] |
WANG A, SANG H, HE J, et al. Effects of cognitive expectation on sound-induced flash illusion[J]. Perception, 2019, 48(12): 1214-1234. doi: 10.1177/0301006619885796
|
[26] |
CHEN Y C, SPENCE C. When hearing the bark helps to identify the dog: semantically-congruent sounds modulate the identification of masked pictures[J]. Cognition, 2010, 114(3): 389-404. doi: 10.1016/j.cognition.2009.10.012
|
[27] |
SHAMS L, BEIERHOLM U R. Causal inference in perception[J]. Trends in cognitive sciences, 2010, 14(9): 425-432. doi: 10.1016/j.tics.2010.07.001
|
[28] |
俞黎平. 猫上丘神经元经验—依赖性多感觉整合可塑性[D]. 上海: 华东师范大学, 2010: 73-80.
|
[29] |
MURRAY M M, WALLACE M T. Arebimodal neurons the same throughout the brain? [M]// The neural bases of multisensory processes. Boca Raton, FL: CRC Press/Taylor and Francis, 2011: 48-62.
|
[30] |
CLEMO H R, KENISTON L P, MEREDITH M A. Structural basis of multisensory processing[M]//The neural bases of multisensory processes. Boca Raton, FL: CRC Press/Taylor and Francis, 2012: 1-12.
|
[31] |
STEIN B E, STANFORD T R. Multisensory integration: current issues from the perspective of the single neuron[J]. Nature reviews neuroscience, 2008, 9(4): 255-266. doi: 10.1038/nrn2331
|
[32] |
STEIN B E, ROWLAND B A. Neural development of multisensory integration[M]//Multisensory perception. Winston-Salem, NC: Academic Press, 2020: 57-87.
|
[33] |
CHOI I, LEE J Y, LEE S H. Bottom-up and top-down modulation of multisensory integration[J]. Current opinion in neurobiology, 2018, 52: 115-122. doi: 10.1016/j.conb.2018.05.002
|
[34] |
GENTILE F, VAN ATTEVELDT N, DE MARTINO F, et al. Approaching the ground truth: revealing the functional organization of human multisensory STC using ultra-high field fMRI[J]. Journal of neuroscience, 2017, 37(42): 10104-10113. doi: 10.1523/JNEUROSCI.0146-17.2017
|
[35] |
STEVENSON R A, JAMES T W. Audiovisual integration in human superior temporal sulcus: inverse effectiveness and the neural processing of speech and object recognition[J]. Neuroimage, 2009, 44(3): 1210-1223. doi: 10.1016/j.neuroimage.2008.09.034
|
[36] |
BEAUCHAMP M S, NATH A R, PASALAR S. FMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect[J]. Journal of neuroscience, 2010, 30(7): 2414-2417. doi: 10.1523/JNEUROSCI.4865-09.2010
|
[37] |
NOPPENEY U. Characterization of multisensory integration with fMRI: experimental design, statistical analysis, and interpretation[M]//The neural bases of multisensory processes. Boca Raton, FL: CRC Press/Taylor and Francis, 2012: 302-322.
|
[38] |
BEAUCHAMP M S, ARGALL B D, BODURKA J, et al. Unraveling multisensory integration: patchy organization within human STS multisensory cortex[J]. Nature neuroscience, 2004, 7(11): 1190-1192. doi: 10.1038/nn1333
|
[39] |
刘强. 多感觉整合脑机制研究[D]. 重庆: 西南大学, 2010: 4-32.
|
[40] |
MARTUZZI R, MURRAY M M, MICHEL C M, et al. Multisensory interactions within human primary cortices revealed by BOLD dynamics[J]. Cerebral cortex, 2007, 17(7): 1672-1679. doi: 10.1093/cercor/bhl077
|
[41] |
LOMBER S G, MEREDITH M A, KRAL A. Cross-modal plasticity in specific auditory cortices underlies visual compensations in the deaf[J]. Nature neuroscience, 2010, 13(11): 1421-1427. doi: 10.1038/nn.2653
|
[42] |
ALAIS D, NEWELL F, MAMASSIAN P. Multisensory processing in review: from physiology to behaviour[J]. Seeing and perceiving, 2010, 23(1): 3-38. doi: 10.1163/187847510X488603
|
[43] |
DRIVER J, NOESSELT T. Multisensory interplay reveals crossmodal influences on 'sensory-specific'brain regions, neural responses, and judgments[J]. Neuron, 2008, 57(1): 11-23. doi: 10.1016/j.neuron.2007.12.013
|
[44] |
GHAZANFAR A A, SCHROEDER C E. Is neocortex essentially multisensory?[J]. Trends in cognitive sciences, 2006, 10(6): 278-285. doi: 10.1016/j.tics.2006.04.008
|
[45] |
ROCKLAND K S, OJIMA H. Multisensory convergence in calcarine visual areas in macaque monkey[J]. International journal of psychophysiology, 2003, 50(1-2): 19-26. doi: 10.1016/S0167-8760(03)00121-1
|
[46] |
SIEVERS B, PARKINSON C, KOHLER P J, et al. Visual and auditory brain areas share a representational structure that supports emotion perception[J]. Current biology, 2021, 31(23): 5192-5203. doi: 10.1016/j.cub.2021.09.043
|
[47] |
CAPPE C, ROUILLER E M, BARONE P. Cortical and thalamic pathways for multisensory and sensorimotor interplay[M]//The neural bases of multisensory processes. Boca Raton, FL: CRC Press/Taylor and Francis, 2012: 12-28.
|
[48] |
BAUER A K R, DEBENER S, NOBRE A C. Synchronisation of neural oscillations and cross-modal influences[J]. Trends in cognitive sciences, 2020, 24(6): 481-495. doi: 10.1016/j.tics.2020.03.003
|
[49] |
VETTER P, BOLA Ł, REICH L, et al. Decoding natural sounds in early "visual" cortex of congenitally blind individuals[J]. Current biology, 2020, 30(15): 3039-3044. doi: 10.1016/j.cub.2020.05.071
|
[50] |
SENKOWSKI D, ENGEL A K. Multi-timescale neural dynamics for multisensory integration[J]. Nature reviews neuroscience, 2024, 25(9): 625-642. doi: 10.1038/s41583-024-00845-7
|
[51] |
SENKOWSKI D, SCHNEIDER T R, FOXE J J, et al. Crossmodal binding through neural coherence: implications for multisensory processing[J]. Trends in neurosciences, 2008, 31(8): 401-409. doi: 10.1016/j.tins.2008.05.002
|
[52] |
张雪, 袁佩君, 王莹, 等. 知觉相关的神经振荡鄄外界节律同步化现象[J]. Progress inbiochemistry and biophysics, 2016, 43(4): 308-315.
|
[53] |
WATKINS S, SHAMS L, JOSEPHS O, et al. Activity in human V1 follows multisensory perception[J]. Neuroimage, 2007, 37(2): 572-578. doi: 10.1016/j.neuroimage.2007.05.027
|
[54] |
GETZMANN S, LEWALD J. Modulation of auditory motion processing by visual motion[J]. Journal of psychophysiology, 2014, 28: 82-100. doi: 10.1027/0269-8803/a000113
|
[55] |
CECERE R, REES G, ROMEI V. Individual differences in alpha frequency drive crossmodal illusory perception[J]. Current biology, 2015, 25(2): 231-235. doi: 10.1016/j.cub.2014.11.034
|
[56] |
KARTHIK G, PLASS J, BELTZ A M, et al. Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex[J]. European journal of neuroscience, 2021, 54(9): 7301-7317. doi: 10.1111/ejn.15482
|
[57] |
MÉGEVAND P, MERCIER M R, GROPPE D M, et al. Crossmodal phase reset and evoked responses provide complementary mechanisms for the influence of visual speech in auditory cortex[J]. Journal of neuroscience, 2020, 40(44): 8530-8542. doi: 10.1523/JNEUROSCI.0555-20.2020
|
[58] |
LUO H, LIU Z, POEPPEL D. Auditory cortex tracks both auditory and visual stimulus dynamics using low-frequency neuronal phase modulation[J]. PLoS biology, 2010, 8(8): e1000445. doi: 10.1371/journal.pbio.1000445
|
[59] |
LAKATOS P, CHEN C M, O'CONNELL M N, et al. Neuronal oscillations and multisensory interaction in primary auditory cortex[J]. Neuron, 2007, 53(2): 279-292. doi: 10.1016/j.neuron.2006.12.011
|
[60] |
THÉZÉ R, GIRAUD A L, MÉGEVAND P. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech[J]. Science advances, 2020, 6(45): eabc6348. doi: 10.1126/sciadv.abc6348
|
[61] |
MISHRA J, MARTINEZ A, HILLYARD S A. Audition influences color processing in the sound-induced visual flash illusion[J]. Vision research, 2013, 93: 74-79. doi: 10.1016/j.visres.2013.10.013
|
[62] |
MÉGEVAND P, MERCIER M R, GROPPE D M, et al. Phase resetting in human auditory cortex to visual speech[J]. BioRxiv, 2018: 405597.
|
[63] |
KEIL J, MÜLLER N, IHSSEN N, et al. On the variability of the McGurk effect: audiovisual integration depends on prestimulus brain states[J]. Cerebral cortex, 2012, 22(1): 221-231. doi: 10.1093/cercor/bhr125
|
[64] |
AMEDI A, MALACH R, HENDLER T, et al. Visuo-haptic object-related activation in the ventral visual pathway[J]. Nature neuroscience, 2001, 4(3): 324-330. doi: 10.1038/85201
|
[65] |
KIM S, JAMES T W. Enhanced effectiveness in visuo-haptic object-selective brain regions with increasing stimulus salience[J]. Human brain mapping, 2010, 31(5): 678-693. doi: 10.1002/hbm.20897
|
[66] |
DAHL C D, LOGOTHETIS N K, KAYSER C. Spatial organization of multisensory responses in temporal association cortex[J]. Journal of neuroscience, 2009, 29(38): 11924-11932. doi: 10.1523/JNEUROSCI.3437-09.2009
|
[67] |
STEVENSON R A, ALTIERI N A, KIM S, et al. Neural processing of asynchronous audiovisual speech perception[J]. Neuroimage, 2010, 49(4): 3308-3318. doi: 10.1016/j.neuroimage.2009.12.001
|
[68] |
YOUNG A W, FRVHHOLZ S, SCHWEINBERGER S R. Face and voice perception: understanding commonalities and differences[J]. Trends in cognitive sciences, 2020, 24(5): 398-410. doi: 10.1016/j.tics.2020.02.001
|
[69] |
CSONKA M, MARDMOMEN N, WEBSTER P J, et al. Meta-analyses support a taxonomic model for representations of different categories of audio-visual interaction events in the human brain[J]. Cerebral cortex communications, 2021, 2(1): tgab002. doi: 10.1093/texcom/tgab002
|
[70] |
BEAUCHAMP M S. Audiovisual speech integration: neural substrates and behavior[M]//Neurobiology of language. Houston, TX: Academic Press, 2016: 515-526.
|
[71] |
DAVIES-THOMPSON J, ELLI G V, REZK M, et al. Hierarchical brain network for face and voice integration of emotion expression[J]. Cerebral cortex, 2019, 29(9): 3590-3605. doi: 10.1093/cercor/bhy240
|
[72] |
TSANTANI M, KRIEGESKORTE N, MCGETTIGAN C, et al. Faces and voices in the brain: a modality-general person-identity representation in superior temporal sulcus[J]. Neuroimage, 2019, 201: 116004. doi: 10.1016/j.neuroimage.2019.07.017
|
[73] |
DEEN B, SAXE R, KANWISHER N. Processing communicative facial and vocal cues in the superior temporal sulcus[J]. Neuroimage, 2020, 221: 117191. doi: 10.1016/j.neuroimage.2020.117191
|
[74] |
GRILL-SPECTOR K, WEINER K S, KAY K, et al. The functional neuroanatomy of human face perception[J]. Annual review of vision science, 2017, 3(1): 167-196. doi: 10.1146/annurev-vision-102016-061214
|
[75] |
PITCHER D, PILKINGTON A, RAUTH L, et al. The human posterior superior temporal sulcus samples visual space differently from other face-selective regions[J]. Cerebral cortex, 2020, 30(2): 778-785. doi: 10.1093/cercor/bhz125
|
[76] |
SAITO H, YUKIE M, TANAKA K, et al. Integration of direction signals of image motion in the superior temporal sulcus of the macaque monkey[J]. Journal of neuroscience, 1986, 6(1): 145-157. doi: 10.1523/JNEUROSCI.06-01-00145.1986
|
[77] |
THOMPSON K G, BISCOE K L, SATO T R. Neuronal basis of covert spatial attention in the frontal eye field[J]. Journal of neuroscience, 2005, 25(41): 9479-9487. doi: 10.1523/JNEUROSCI.0741-05.2005
|
[78] |
SAXE R, CAREY S, KANWISHER N. Understanding other minds: linking developmental psychology and functional neuroimaging[J]. Annual review of psychology, 2004, 55(1): 87-124. doi: 10.1146/annurev.psych.55.090902.142044
|
[79] |
HEIN G, KNIGHT R T. Superior temporal sulcus—it's my area: or is it?[J]. Journal of cognitive neuroscience, 2008, 20(12): 2125-2136. doi: 10.1162/jocn.2008.20148
|
[80] |
DEEN B, KOLDEWYN K, KANWISHER N, et al. Functional organization of social perception and cognition in the superior temporal sulcus[J]. Cerebral cortex, 2015, 25(11): 4596-4609. doi: 10.1093/cercor/bhv111
|
[81] |
ISIK C, DOGRU T, TURK E S. A nexus of linear and non-linear relationships between tourism demand, renewable energy consumption, and economic growth: theory and evidence[J]. International journal of tourism research, 2018, 20(1): 38-49. doi: 10.1002/jtr.2151
|
[82] |
ZHU L L, BEAUCHAMP M S. Mouth and voice: a relationship between visual and auditory preference in the human superior temporal sulcus[J]. Journal of neuroscience, 2017, 37(10): 2697-2708. doi: 10.1523/JNEUROSCI.2914-16.2017
|
[83] |
BEAUCHAMP M S. See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex[J]. Current opinion in neurobiology, 2005, 15(2): 145-153. doi: 10.1016/j.conb.2005.03.011
|
[84] |
ALSIUS A, PARÉ M, MUNHALL K G. Forty years after hearing lips and seeing voices: the McGurk effect revisited[J]. Multisensory research, 2018, 31(1-2): 111-144. doi: 10.1163/22134808-00002565
|
[85] |
韩海宾, 许萍萍, 屈青青, 等. 语言加工过程中的视听跨通道整合[J]. 心理科学进展, 2019, 27(3): 475-489.
|
[86] |
MCGURK H, MACDONALD J. Hearing lips and seeing voices[J]. Nature, 1976, 264(5588): 746-748. doi: 10.1038/264746a0
|
[87] |
罗霄骁, 康冠兰, 周晓林. McGurk效应的影响因素与神经基础[J]. 心理科学进展, 2018, 26(11): 1935-1951.
|
[88] |
JONES J A, CALLAN D E. Brain activity during audiovisual speech perception: an fMRI study of the McGurk effect[J]. Neuroreport, 2003, 14(8): 1129-1133. doi: 10.1097/00001756-200306110-00006
|
[89] |
SZYCIK G R, STADLER J, TEMPELMANN C, et al. Examining the McGurk illusion using high-field 7 Tesla functional MRI[J]. Frontiers in human neuroscience, 2012, 6: Article 95.
|
[90] |
NATH A R, BEAUCHAMP M S. Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech[J]. Journal of neuroscience, 2011, 31(5): 1704-1714. doi: 10.1523/JNEUROSCI.4853-10.2011
|
[91] |
NATH A R, BEAUCHAMP M S. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion[J]. Neuroimage, 2012, 59(1): 781-787. doi: 10.1016/j.neuroimage.2011.07.024
|
[92] |
BAUM S H, MARTIN R C, HAMILTON A C, et al. Multisensory speech perception without the left superior temporal sulcus[J]. Neuroimage, 2012, 62(3): 1825-1832. doi: 10.1016/j.neuroimage.2012.05.034
|
[93] |
BOUTON S, DELGADO-SAA J, OLASAGASTI I, et al. Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure[J]. Scientific reports, 2020, 10(1): 18009. http://www.xueshufan.com/publication/2973388311
|
[94] |
RAIJ T, UUTELA K, HARI R. Audiovisual integration of letters in the human brain[J]. Neuron, 2000, 28(2): 617-625.
|
[95] |
BENOIT M M K, RAIJ T, LIN F H, et al. Primary and multisensory cortical activity is correlated with audiovisual percepts[J]. Human brain mapping, 2010, 31(4): 526-538. doi: 10.1002/hbm.20884
|
[96] |
ERICKSON L C, HEEG E, RAUSCHECKER J P, et al. An ALE meta-analysis on the audiovisual integration of speech signals[J]. Human brain mapping, 2014, 35(11): 5587-5605. http://www.onacademic.com/detail/journal_1000039163214010_bf46.html
|
[97] |
LI Y, SEGER C, CHEN Q, et al. Left inferior frontal gyrus integrates multisensory information in category learning[J]. Cerebral cortex, 2020, 30(8): 4410-4423.
|
[98] |
CAO Y, SUMMERFIELD C, PARK H, et al. Causal inference in the multisensory brain[J]. Neuron, 2019, 102(5): 1076-1087.
|
[99] |
PITCHER D, UNGERLEIDER L G. Evidence for a third visual pathway specialized for social perception[J]. Trends in cognitive sciences, 2021, 25(2): 100-110. http://pubmed.ncbi.nlm.nih.gov/33334693/
|
[100] |
SHAMS L, BEIERHOLM U R. Causal inference in perception[J]. Trends in cognitive sciences, 2010, 14(9): 425-432.
|
[101] |
KAYSER C, SHAMS L. Multisensory causal inference in the brain[J]. PLoS biology, 2015, 13(2): e1002075.
|
[102] |
MIHALIK A, NOPPENEY U. Causal inference in audiovisual perception[J]. Journal of neuroscience, 2020, 40(34): 6600-6612.
|
[103] |
ROHE T, NOPPENEY U. Cortical hierarchies perform Bayesian causal inference in multisensory perception[J]. PLoS biology, 2015, 13(2): e1002073. http://pubmedcentralcanada.ca/pmcc/articles/PMC4339735/?report=abstract
|
[104] |
LIU H, LI C, LI Y, et al. Improved baselines with visual instruction tuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA: IEEE, 2024: 26296-26306.
|
[105] |
GIRDHAR R, EL-NOUBY A, LIU Z, et al. Imagebind: one embedding space to bind them all[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC: IEEE, 2023: 15180-15190.
|
[106] |
BARRAULT L, CHUNG Y A, MEGLIOLI M C, et al. SeamlessM4T-massively multilingual and multimodal machine translation[J]. 2023, ArXiv preprint arXiv: 2308.11596.
|
[107] |
ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[J]. 2023, ArXiv preprint arXiv: 2303.08774.
|
[108] |
YIN S, FU C, ZHAO S, et al. A survey on multimodal large language models[J]. National science review, 2024: nwae403.
|
[109] |
WEI Y, HU D, TIAN Y, et al. Learning in audio-visual context: a review, analysis, and new perspective[J]. 2022, ArXiv preprint arXiv: 2208.09579.
|
[110] |
SONG S, LI X, LI S, et al. How to bridge the gap between modalities: a comprehensive survey on multimodal large language model[J]. 2023, ArXiv preprint arXiv: 2311.07594.
|
[111] |
WANG Y. Formal models and cognitive mechanisms of the human sensory system[J]. International journal of software science and computational intelligence, 2013, 5(3): 55-75.
|
[112] |
ROHE T, ZEISE M L. Inputs, outputs, and multisensory processing[M]//Neuroscience for psychologists: an introduction. Switzerland: Springer Cham, 2021: 153-192.
|
[113] |
BARCZAK A, O'CONNELL M N, SCHROEDER C E. Thalamic contributions to multisensory convergence[M]//The cerebral cortex and thalamus. New York: Oxford University Press, 2024: 305-315.
|
[114] |
PESNOT LEROUSSEAU J, PARISE C V, ERNST M O, et al. Multisensory correlation computations in the human brain identified by a time-resolved encoding model[J]. Nature communications, 2022, 13(1): 2489.
|
[115] |
TSILIONIS E, VATAKIS A. Multisensory binding: is the contribution of synchrony and semantic congruency obligatory?[J]. Current opinion in behavioral sciences, 2016, 8: 7-13.
|
[116] |
MERCIER M R, MOLHOLM S, FIEBELKORN I C, et al. Neuro-oscillatory phase alignment drives speeded multisensory response times: an electro-corticographic investigation[J]. Journal of neuroscience, 2015, 35(22): 8546-8557.
|
[117] |
ENGEL A K, SENKOWSKI D, SCHNEIDER T R. Multisensory integration through neural coherence[M]//The neural bases of multisensory processes. Boca Raton, FL: CRC Press/Taylor and Francis, 2012: 112-128.
|
[118] |
ATREY P K, HOSSAIN M A, EL SADDIK A, et al. Multimodal fusion for multimedia analysis: a survey[J]. Multimedia systems, 2010, 16: 345-379.
|
[119] |
ZHANG X, HUANG H, JIA X, et al. Multi-stage fusion for event-based multimodal tracker[C]//2024 IEEE International Conference on Multimedia and Expo (ICME). San Jose, CA: IEEE, 2024: 1-6.
|
[120] |
ERNST M O, BANKS M S. Humans integrate visual and haptic information in a statistically optimal fashion[J]. Nature, 2002, 415(6870): 429-433
|
[121] |
QI D, SU L, SONG J, et al. Imagebert: cross-modal pre-training with large-scale weak-supervised image-text data[J]. 2020, ArXiv preprint arXiv: 2001.07966.
|
[122] |
YE H, HUANG D A, LU Y, et al. X-VILA: cross-modality alignment for large language model[J]. 2024, ArXiv preprint arXiv: 2405.19335.
|
[123] |
YILDIRIM I, JACOBS R A. Learning multisensory representations for auditory-visual transfer of sequence category knowledge: a probabilistic language of thought approach[J]. Psychonomic bulletin and review, 2015, 22: 673-686.
|
[124] |
SADATO N, OKADA T, HONDA M, et al. Cross-modal integration and plastic changes revealed by lip movement, random-dot motion and sign languages in the hearing and deaf[J]. Cerebral cortex, 2005, 15(8): 1113-1122.
|
[125] |
BENETTI S, VAN ACKEREN M J, RABINI G, et al. Functional selectivity for face processing in the temporal voice area of early deaf individuals[J]. Proceedings of the national academy of sciences, 2017, 114(31): E6437-E6446.
|