4.2 Applications inspired by natural mechano-sensory Systems
There are many potential applications for mechano-sensory systems. As can be seen from the example applications that follow, there are numerous natural paradigms to consider for the inspiration of novel design ideas. For example, barn owls, crickets, bats, dolphins, and primate cochlea represent a sample of designs accomplished by attempting to demonstrate or build mechano-sensory systems based on biological inspiration. There are also many useful applications resulting in a divergence from bio-mimicry, such as transforming photonic energy into sound energy and allowing the organism (blind person) the opportunity to learn how to see based on stimulated auditory cues.
4.2.1 Auditory Pathway of the Barn Owl [Lazz90]
The barn owl localizes its prey by using timing delays between the two ears for determining azimuth (angle from directly forward) and intensity variations to determine elevation (angle from the horizon) with respect to itself. The result is a conformal mapping onto the inferior colliculus (IC) of sound events in auditory space. Each sound source is mapped to a specific location in the IC representing azimuth and elevation with respect to itself [Lazz90].
The auditory signals from the cochlea divide into two primary pathways that eventually meet in the IC. The first is the intensity pathway and passes through the nucleus angularis (NA), encoding elevation information. This is possible in part due to sound absorption variations caused by feather patterns on the face and neck. The second is the time-coding pathway and passes through the nucleus magnocellularis (NM) onto the nucleus laminaris (NL) where it meets the corresponding signals from the time-coding pathway from the opposite side.
Figure 4.2.1-1 represents the two information pathways leading to the IC. The details of the IC are spared to focus more on the pathway structure. Figure 4.2.1-2 shows a notional concept for coincidence detection in the timing circuits of the NL. As drawn, the spatial location of the output signals represents the same spatial direction (azimuth or heading) of the originating sound source.
Assume the total time it takes sound to travel the distance from one ear to the other is divided into 8 time delays, each denoted at Δt, as shown in the model (Figure 4.2.1-2). A stimulus on the immediate left side of the owl (left side of Figure 4.2.1-2) would travel through the bottom row of delays before the right side received the stimulus, therefore resulting in a correlation on the left side. Similarly, a stimulus on the immediate right side of the owl will result in a correlation on the right side of the model. Stimuli in between immediate left or right would result in a correlation somewhere in between these two extremes.
Time-coding Auditory System
The time-coding architecture of the barn owl is implemented in the silicon auditory localization circuit [Lazz90] as shown in Figure 4.2.1-3. Sound enters the system from the left and right ears into respective silicon cochlea described in the previous section. From there 62 equally-spaced taps (representing the basilar membrane neurons in natural cochlea) encode the spectral signature at each side. Each tap feeds a hair-cell circuit that performs half-wave rectification, nonlinear compression, and action potential generation. The action potentials in the silicon version are fixed-width fixed-height pulses. As in natural neurons, the frequency of the action potential pulses represents the intensity, and the timing preserves the temporal characteristics of the signal.
The details of the hair-cell circuits are shown in Figure 4.2.1-4. The half-wave rectifier and nonlinear compression simulate the inner hair cells and the action-potential generator simulates the natural spiral ganglion cells that take signals from the cochlea in owls, primates, and other species. For the barn owl, these circuits feed the NL-model delay lines like the ones modeled in Figure 4.2.1-2.
4.2.2 Robotic Implementation of Cricket Phonotaxis [Webb01, Webb02]
The male cricket gives a mating call to attract female crickets, and a female can find a specific male using phonotaxis, which means movement in response to sound stimulus. In the presence of other noises, the female uses these auditory cues to cover 10 to 20 meters through vegetation and terrain and around obstacles to find the calling male. Phonotaxis is typically seen as a series of start-stop movements with corrective turns.
The “cricket robot” implementing phonotaxis in this example can be modeled as first recognizing the correct song, and then moving toward the source. Each species has a specific sound characterized by a carrier frequency and a temporal repetition structure. A typical pattern is a ten to thirty second syllable of a pure tone (around 4-5 kHz) grouped in distinctive patterns, or chirps. A primary cue serving to discriminate between species is the syllable repetition interval in the song. The correct recognition of this conspecific (same species) song is required before migration toward the source.
The cricket does not use time-delay signals between two ears as mammals do nor can it detect phase of the incoming signal. The geometry of the anatomical structure compensates for this inability and gives the cricket the same capability without the complex circuitry. It has an eardrum on each leg connected by an air-filled tracheal tube and two additional openings on the cricket body. Sound reaches each eardrum in two primary paths: one is direct, striking the eardrum on the same side of the cricket as the sound, and the other is indirect, coming from the opposite side of the cricket body. Since these acoustical vibrations are on opposite sides of the eardrum, their effect generally cancels. However, there is a delay due to a longer path-length as well as a delay due to the tracheal tube properties. These delays cause phase differences between the opposing acoustic signals so that the amplitudes do not cancel.
The robotic model of cricket phonotaxis includes a programmable electronic sound source for modeling the cricket call, and a neural network modeling the dynamics of cell membrane potentials. The neural network model is not a generic architecture, but a specific architecture designed to mimic the neuronal structure of the cricket more closely:
“The architectures represent neural processes at appropriate levels of detail rather than using standard artificial neural net abstractions. Individual neuron properties and identified connectivity are included, rather than training methods being applied to generic architectures.” [p. 3, Webb01]
The robot is a modification of an existing miniature robot (Khepera, “K-team 1994”) that is 6 cm in diameter and 4 cm high. It was chosen as it is closer to cricket size than other available robots, although this size is still much more massive than a cricket. A modification for ears added another 6 cm in height. The robot has 2 drive wheels and 2 castors and is programmed in C on a 68332 processor. Due to processor speed limitations, the neuronal model had to be revised (simplified) to run real time. This is a common theme in biomimetic systems: Although conventional processors are 5 or 6 orders of magnitude faster than biological neurons, we still must make sacrifices in computations to achieve any semblance of real-time biomimicry.
Figure 4.2.2-1 shows the simulated neuronal interconnects for the cricket robot. The separation between the microphone ears can be varied but is set at one-quarter of the mimicked species carrier frequency. Another one-quarter period delay is programmed into the inhibitory connection to simulate the delay in the tracheal tube. The inverter (gain of –1) simulates the opposing effects of the direct and indirect pathways striking the eardrum on opposite sides. In real crickets, the auditory neuron sends signals to the brain, where the connectivity and functionality are still not yet understood. The robotic model includes membrane potentials that result in action potential (spike) signal generation, but the reduction to four simple neurons was done in the robotic implementation in part to keep the simulation operating in real-time.
Each time a motor neuron in Figure 4.2.2-1 results in an action potential, the robot moves incrementally in that direction. The auditory neurons fire (send action potentials) when the threshold for firing is exceeded. All neurons exhibit leaky integration so that stray noises will not result in action potentials. A constant input stronger than the signal being leaked out must be sustained in order to bring the neuron to firing an action potential. However, the auditory neurons rapidly fire once initiated. This is modeled by returning the membrane potential closer to the threshold (-55 mV typ.) after an action potential instead of returning to the resting potential (-70 mV typ.)
The calling frequency is 4.7 kHz to match a specific species, the Gryllas bimaculatus. The robot microphones were placed 18 mm apart, which is a quarter wavelength of the 4.7 kHz calling frequency. An additional one-quarter period delay is also programmed into the circuitry as a 53 us delay. When a signal is received from a right angle to the heading, then the combined delays would add to one-half wavelength, which, when inverted, would combine with the direct signal to give a maximum signal for the motor neuron to turn the robot toward the sound. The opposite motor neuron would receive the direct signal and inverted indirect signal at the same time, thus canceling. When directly in front of the robot, the same signal would be received at both motor neurons so that the left-right turning would cancel, and the robot would continue straight.
Results and discussion [Webb01]
The ¼-wavelength physical ear separation and the ¼-wavelength programmable delay for a 4.7 kHz carrier proved to mimic biological observation. Experimental results showed that the robot migrated toward a 4.7 kHz signal more strongly than a 2.35 kHz signal and would ignore a 9.4 kHz signal. It would also move toward the 4.7 kHz signal when played simultaneously with a 6.7 kHz signal.
By tuning the time constants, the response could be made selective for a bandpass of syllable rates. In one example, the robot responded to changes in signal direction when the syllables were 20 to 30 ms long but would not respond for shorter or longer syllables. The programmability built into this cricket robot will allow further study into the alternate hypotheses of how crickets and other animal species perform phonotaxis. The system will also allow for further study into non-phonotaxis capabilities of such a sensorimotor system.
Although the four-neuron model does not mimic the complexity of the cricket brain, it does demonstrate a minimal configuration for accomplishing basic phonotaxis functions, such as tracking of sound sources, selectivity for specific frequencies, selectivity for syllable rates, tracking behavior without directional input, and tracking behavior in the presence of other sound sources.
4.2.3 Mead/Lyon Silicon Cochlea [Lyon89]
The Mead/Lyon [Lyon89] Silicon Cochlea is a transmission line of second-order amplifier circuits illustrated in Figure 4.2.3-1. First order stages are simple circuits such as differentiators or integrators, whose step responses are typically an exponential response toward a steady-state condition. The second-order stages provide sinusoidal response characteristics to step responses that will provide a peak response at a resonant frequency. In the initial silicon cochlea circuit, there were 100 second-order circuits with 10 voltage taps evenly spaced along the design.
Each second-order circuit is composed of three op-amps and two capacitors configured as cascaded follower-integrator circuits with a feedback amplifier providing oscillatory responses. The transconductance of the feedback amplifier is controlled by an external bias voltage. For low feedback transconductance, the circuit behaves as a two-stage follower-integrator, which follows the input voltage. As the feedback transconductance is increased, positive feedback causes the second integrator-follower to leap ahead slightly and oscillate to a steady state value. If the transconductance is set too high, the circuit oscillates out of control (goes unstable).
Once appropriately calibrated (tuned), the peak response of each second-order circuit is a function of the input frequency. Since each stage inherently adds a smoothing effect, the individual frequencies of the input voltage signals will have a peak response somewhere along the 100-stage circuit. As in natural cochlea, the spatial distribution of the voltage taps provides a sample of the Fourier representation of the input voltage signal. However, in natural cochlea the mechanical design of the basilar membrane provides physical peak deflections (corresponding to signal frequency components present in the input signal) while this design models the mechanical cochlear structure with a bank of 2nd order electronics filters.
4.2.4 MEMS-based electronic cochlea [Andr01]
An example of a Micro-electromechanical system (MEMS) approach to a silicon electronic cochlea is described in [Andr01]. MEMS allows for mechanical distortion due to the incident sound energy to change the distance between two polysilicon plates that are implemented as a capacitor. This design concept includes a MEMS-based acoustic pressure gradient sensor and filter bank that decomposes incident acoustical energy into its wavelet components. The pressure transducer is a conventional MEMS polysilicon diaphragm suspended in air over polysilicon backplate. Inspired by mechanically-coupled acoustic sensory organs of the parasitoid fly, the transducers are connected by a first-layer polysilicon beam, allowing for pressure gradient measurement. As acoustical energy strikes the external plate, the plate is distorted toward the backplate, reducing the air distance separating the two plates. This causes a decrease in the capacitance in response to acoustic pressure. The MEMS silicon cochlea implementation is composed of MEMS filter banks that allow for a real-time wavelet decomposition of the received acoustical energy
The advantages of the MEMS-based approach over analog VLSI approach is a lower power requirement as the physical energy of the sound waves is doing some of the work of the VLSI transconductance amplifiers. Also, since the MEMS-based approach more closely resembles natural systems there is a more direct correlation with system response to input acoustical energy.
Another application of MEMS technology for biomimetic robots include cantilever microswitches to model antenna behavior and provide water-flow sensors. These MEMS-based sensors are being used to model lobster and scorpion behaviors on underwater robotic vehicles [McGr02].
4.2.5 “See-Hear” design for the blind by retraining auditory system [Mead89]
The “See-Hear” concept is intended to help a blind person “see” by hearing different sounds based on objects visible in a head-mounted camera system [Mead89, Ch 13]. Successful implementation requires transforming visual signals into acoustic signals so that users can create a model of the visual world with their auditory system.
Both vision and auditory systems have receptive fields representing data distributions within the local environment. The vision system maps light emissions and reflections from 3D objects onto the 2D photoreceptor mosaic in the retina, whose conformal mapping onto the brain is called the retinotopic map. Similarly, the auditory system takes frequency components of local sound energy and maps a spectrum onto the basilar membrane in the cochlea and subsequently (via cochlear nerve) to a conformal map on the brain called the tonotopic map.
Both vision and auditory systems are concerned with detecting transient events. The vision system detects motion by taking time-space derivatives of the light intensity distribution. Transients help to localize events in both space and time, and the brain constructs a 3D model of the world using motion parallax, which is the apparent object motion against the background caused by observer motion. If an observer is focused on a point at infinity and moves slowly, then nearby objects appear to move rapidly against the infinite background, while objects farther away appear to move more slowly. Transient sounds are also easily detected and localized in the auditory system.
The vision and auditory systems differ in how the peripheral information is processed:
“In vision, location of a pixel in a 2D array of neurons in the retina corresponds to location of objects in a 2D projection of the visual scene. The location information is preserved through parallel channels by retinotopic mapping. The auditory system, in contrast, has only two input channels; location information is encoded in the temporal patterns of signals in the two cochleae. These temporal patterns provide the cues that the higher auditory centers use to build a 2D representation of the acoustic environment, similar to the visual one, in which the position of a neuron corresponds to the location of the stimulus that it detects.” [Mead89]
The key biological vision concepts exploited in the See-Hear chip include [Mead 89]:
- Logarithm of light intensity collected at the photoreceptor; using a logarithmic function expands the available dynamic range as compared to a linear function.
- The spatial orientation of light sources (which includes reflected light) is preserved from the photoreceptor mosaic through the retinotopic map
- Depth cues required for mental reconstruction of 3D space are provided by time-derivative signals of the light intensity profile
The key auditory cues for sound localization include:
- Time delay (350 –650 microseconds) between ears, providing horizontal placement cue
- Acoustic high-frequency attenuation, providing further horizontal placement cue
- Direct and indirect pathways in the outer ear causing a destructive interference pattern that is a function of elevation, thus providing a vertical placement cue
As in a natural vision system, the See-Hear system accepts photonic energy through a lens and focuses the energy onto a 2D array of pixel. (A pixel is simply a picture element). Each pixel value represents the light coming from a specific direction in the 3D world. The See-Hear chip includes local processing at each pixel location.
Each pixel processor responds to the time-derivative of the logarithm of the incident light intensity. The incoming photons of light enter the depletion region of a bipolar junction phototransistor creating electron-hole pairs in quantities proportional to the light intensity. Two diode-connected MOS transistors connected to the emitter cause a voltage drop in response to the logarithm of the light intensity. A MOS transconductance amplifier with nonlinear feedback provides a time-derivative output signal of the pixel processor. Each pixel processor is capacitor-coupled to adjacent pixels so that each pixel processor act as a delay line.
Time-derivative signals propagate in two directions in the electronic cochlea circuit, which results in a mimicry of the time delays between the left and right ears. As seen in Figure 4.2.5-1 a transient event in the left visual field will result in sound on the left side before sound on the right side, which mimics the behavior of sound events in auditory systems. The time delay circuit also filters higher frequencies, so that longer delays result in more attenuation of higher frequencies. This feature therefore models the binaural head-shadow, which is the attenuation of high frequencies as the sound travels around the head. The combined effect of delayed signals and high-frequency attenuation of the delay channels serves to combine both natural horizontal localization cues into one circuit.
Since each pixel processor circuit in the electronic cochlea contains its own photoreceptor circuit, multiple sound sources are processed as a superposition of the individual sources. To model the elevation inputs from the pinna-tragus pathway differences, the see-hear chip contains an additional delay circuit at each end. The 2D image is focused on a 2D array of pixel processors, and the output of each horizontal row is added to a delayed version of itself to model the mixing of the pathways relevant to the elevation of the objects in the image. In this way, the outputs of each row are all summed together to create only two separate sound signals, one for each ear. If two of the same objects were at different elevations within the image, the different pinna-tragus pathway delays at the end of their respective rows will provide the user with an audible queue as to where (in elevation) the object is located.
The user can ultimately learn how to hear a 3D model of the external environment based on what is visually captured with the camera system.
4.2.6 A biomimetic sonar system [Reese94]
A “Biologic Active Sonar System (BASS)” based on echo processing of bats and dolphins was designed to detect and classify mines in shallow water [Reese94]. Front-end filters and nonlinear functions emulating auditory neuronal models were used to obtain high resolution with low frequency sonars (which is another example of coarse coding in natural systems). The intended product of this research is a system implementation into an autonomous underwater vehicle.
Figure 4.2.6-1 shows the block diagram of the BASS processing stages. The band-pass filters (BPF’s) have sharp roll-off characteristics at high frequencies and are broad-band, overlapping other channels significantly (coarse coding). This is inspired by natural peripheral auditory processing and provides good time/frequency definition of the signal as well as increases in-band signal-to-noise ratio (SNR).
As in the vision system, the automatic gain control (AGC) allows for covering a much wider dynamic range, which is based on integrate-to-threshold behavior of auditory signals. This sharpens signal onset time, which translates to sharpening range resolution. The half-wave rectifier and sigmoid function is inherent in mammalian auditory processing and serves to sharpen the onset-time and range resolution.
Peak summing and delay provide in-band coherent addition and inter-band signal alignment. This mimics natural biological phase-locked loops and provides pulse compression. The anticipated benefits of such a wide-band low frequency design is longer detection ranges and better target recognition of partially buried mines.