Auditory perception builds on the ability of the nervous system to perceive sequential cues in an unsegmented, unending stream of acoustic stimulation. The development of sequential pattern recognition abilities is not entirely the result of genetic specification; rather, it is shaped by auditory experience. This report develops and tests principles that enable an artificial neural network to structure itself to detect the most salient sequences in its sound environment.
The network comprises clusters of units connected by time-delayed lateral excitatory connections. Over time, adaptive mechanisms cause individual units in the network to become sensitive to particular sequential patterns. Units within each cluster compete to encode patterns within their initial receptive fields. Connection strengths between units adapt to reflect statistically important dynamic features present in the environment. In the resultant network, these connections support real-time sequence recognition. A network of two such sub-networks is shown to learn more complex sequential patterns. Levels of the network farther from the periphery learn to segment and encode increasingly complex patterns using the features developed by earlier sub-networks.
The model's performance is evaluated by exposing a two-layer network to a small set of English stop-vowel syllables. Each layer develops units of increasing selectivity. The first layer develops units sensitive to upward and downward frequency modulations. The second layer develops units that are selective for complex auditory features. It is argued that these feature detectors may be useful to auditory pattern recognition.