3.2 Applications inspired by natural photo-sensory Systems
The first photosensory application is the author’s own idea to use gaussian filters for emulation of low-pass spatial-temporal filters of the photoreceptors and horizontal cells and to do that at three levels, each resulting in inherent delays that are used for elementary motion detection (EMD) models. The three different levels allow for modeling the well-known center-surround contrasting signals (propagated by bipolar cells) that comprise the magnocellular and parvocellular pathway signals. It also allows for two different EMD’s at each location. The additional EMD gives a degree of freedom needed to determine edge velocity.
The next group of research efforts are focused on modeling the outer plexiform layer (OPL) of the retina (photoreceptors, horizonal cells, and bipolar cells) using VLSI circuits. Biology is made of material with a natural plasticity for adapting to the organisms needs. Silicon is brittle, but very reliable as a technology for implementing the behavior of the OPL. Following those efforts are the ones combining the silicon retina concepts with optic flow for a more comprehensive adaptive pixel that better emulates the OPL.
A few examples of exploiting natural foveal vision are then presented. The densely-packed photoreceptors in the very center of the retina provides much better spatial acuity than the periphery, where photoreceptors are not as densely packed. This can be misleading as our ability to see detail in the very center far surpasses that of the periphery, and the photoreceptor packing is a very small part of that. There are about 5 times the number of rod cell than cone cells in the retina, but none in the fovea (thus a faint star may disappear when we look right at it). Also, cells are more interconnected in the periphery to afford better temporal resolution at the cost of spatial resolution. Most fovea-vision-inspired applications concern the higher resolution in a region of interest and not the representative rod and cone cell distributions and non-uniform level of cell interconnections.
The group following is focused on asynchronous event-based signaling which, like biology, results in a spike (or action potential) when a significant event happens (or a threshold is exceeded). Diverging from biology into a possible realm of much higher signal processing capabilities is the notion of doing the same OPL signal processing but with photonics rather than electronics. This would be a significant deviation from biology, but as pointed out before many times researchers are using biology to glean novel ideas and not necessarily attempting to duplicate biology. Another frontier being pursued is the incorporation of polarization information in vision systems as indicated in the final section.
3.2.1 Combined EMD and Magno/Parvo channel model [Brooks18]
The Hassenstein-Reichardt elementary motion detection (HR-EMD) model [Hass56] reviewed earlier cannot accurately measure optic flow velocity. A simplified version of the HR-EMD is shown in Figure 3.2.1-1. There is an optimal speed for the peak response of the EMD based on the design of the delay element. If the spatial contrast is weak but moving across the image at that speed the response can be moderate and can be the same as a stronger spatial contrast moving at a sub-optimal speed. Another information dimension is needed to determine edge velocity; one approach is to measure the power spectral density (PSD) of the image and combine that with a global EMD response in the form of a look-up table [Wu12] although a PSD measurement of an image is not known to exist in biology. These and similar approaches have used the delay inherent in traditional low-pass filters (LPFs) such as Butterworth filters (popular due to being maximally flat in pass band). Again, Butterworth filters are not known in biology. The best model of LPFs in biology are gaussian filters, which are not popular in conventional applications due to properties such as non-orthogonality. However, gaussian filters are naturally occurring in biology due to ion leakage, charge-sharing amongst receptor cells and excitatory and inhibitory signals of adjacent layers of neurons.
Gaussian filters can also model the magnocellular and parvocellular pathways (MP and PP); each channel of the MP or PP can be modeled as a difference-of-gaussian filter between the center receptor (or group of receptors) and the surrounding receptors, referred to as center-surround antagonistic signals. To model the either the MP or PP two gaussian are needed, a smaller variance gaussian for the center field and a larger variance gaussian for the surrounding field. Possibly (a subject for future experimentation) both channels can be modeled with a total of 3 gaussian filters, where the variance of the surrounding PP signal is the same as the variance of the center MP signal. These three gaussian filters are identified in Figure 3.2.1-2 as having high, medium, and low cutoff frequencies. Keep in mind these are spatial-temporal filters, so the frequencies are multidimensional to include both time and space. In the primate vision system these spatial-temporal filters would be implemented at each receptor location by the effects of weak inter-photoreceptor connections, the effects of lateral inhibition of the horizontal cells, the propagation of bipolar cells, the further mediation by the amacrine cells as the signal is passed through the ganglion cells.
Spatial-temporal gaussian filter effects are well known in vision. The three gaussians in Figure 3.2.1-2 provide the necessary information for both MP and PP channel modeling as well as two separate EMD channels, referred to in the figure as the Parvo EMD and Magno EMD. Having two separate EMD channels gives the additional degree-of-freedom needed for object velocity determination. The initial LPF (with high cutoff frequency) is used as the ‘receptor’ signal in Figure 3.2.1-1 for both EMDs, and the delayed signal is the output of the second LPF (medium cutoff) for the Parvo EMD while the delayed signal is the output of the third LPF (low cutoff) for the Magno EMD.
The object velocity is a function of location in the image, and ambiguity would be expected if only one EMD measurement were available. However, in this model two independent EMD outputs are available, so the object velocity would be determined by some combination of the responses of the Parvo EMD and Magno EMD. Another subject for future experimentation would be how the signals are combined to give the unique velocity. This is very consistent with the coarse coding concepts we see throughout biological sensory systems (and likely higher brain function).
Figure 3.2.1-3 shows how the two separate EMDs can be combined to give a specific object motion velocity at the given location in the receptive field. The output of the left and right receptors in this figure would be the output of the high cutoff LPF of Figure 3.2.1-2. The output of delays D1 and D2 correspond to the outputs of the medium cutoff LPF and the low cutoff LPF of Figure 3.2.1-2, respectively.
The effectiveness of this magno/parvo EMD model can be simulated in MATLAB or other visualization tool. Letting γ control the amount of spatial spreading between frames (limiting it to a value between 0 and 1) then the pixel value retained will be γ times the current value which will be added to (1- γ) times the average of the 4 nearest neighbor current pixel values. Letting α control the amount of temporal smoothing so that the current pixel value is multiplied by α and added to (1- α) times the current spatially-processed pixel. The spatial-temporal effects are provided by the horizonal cells, so we reference that signal as Hi,j, where i is the row index and j is the column index. Letting Pi,j represent the pixel value (the modeled receptor value) at the ith row and jth column and using T as a temporary variable (for clarity) we have the following update algorithm:
T = 0.25(1- γ)(H(i-1),j + Hi,(j-1) + Hi,(j+1) + H(i+1),j) + γPi,j
Hi,j = (1- α)T + α Hi,j
The constants γ and α represent levels of spatial smoothing and temporal smoothing respectively, which in both cases gives the low-pass filtering effects of a gaussian filter (simultaneously in both time and space domains). These can be made adaptive once a performance metric is determined. Simulating the three filters in Figure 3.2.1-2 is accomplished by tapping the results at differing numbers of iterations as the visual information is processed. A few iterations implement a high cutoff (spatial-temporal) frequency, more iterations would give a medium cutoff frequency, and even more iterations a lower cutoff frequency. There are several degrees-of-freedom for experimentation, including the spatial and temporal smoothing constants along with the number of iterations for implementing the gaussian filters.
3.2.2 Autonomous hovercraft using insect-based optic flow [Roub12]
It is well known that insects such as honeybees navigate their environment by optic flow queues in the visual field. Insect-inspired optic flow was demonstrated in a small hovercraft robot [Roub12] autonomously following a wall and navigating a tapered corridor. The design is focused on obstacle avoidance in the azimuth plane with 4 2-pixel optic flow (OF) sensors at 45O and 90O on both left and right sides. The hovercraft robot followed a wall at a given distance as well as successfully navigating through a tapered corridor. As seen in experiments with honeybees [Srini11] the velocity decreases as it successfully navigates through a tapered corridor, a natural consequence of maintaining constant OF as side get closer. The honeybee navigation was presumed to be the result of the balancing OF on both sides of the insect.
The hovercraft demonstrated the ability to adjust forward speed and clearance from the walls without rangefinders or tachometers. However, a magnetic compass and accelerometer were used to prevent movement in the yaw axis direction so that the craft continues to move forward. This is necessary since the experiment focused on the OF queues and the ability to navigate the corridor.
The algorithm was developed in simulation and implemented on this hovercraft. All 4 sensors (two at 45° and two at 90° from the forward direction on each side) were used in the navigation algorithm the authors call dual lateral optic flow regulation principle. It demonstrates a more comprehensive suggestion as to how honeybees navigate their environment than simply balancing optic flow from the two sides. This is an example of a bio-inspired sensor that is used to help biologists better understand how honeybees navigate their environment.
3.2.3 Autonomous hovercraft using optic flow for landing [Dup18]
In this effort 12 optic flow pixel sensors implementing a threshold-based motion detection is compared to a more traditional set of 12 optic flow pixels implementing a cross-correlation method. The cross-correlation method is more robust, but also more computationally complex. If a sufficient threshold method can work, then the complexity is greatly reduced. The drawback is the performance is strongly dependent on the threshold, which can vary from scene to scene and differing illumination conditions.
The application in mind is a hovercraft using optic flow sensing on the ventral side (under side) of the craft to ensure smooth landing. As an insect gets closer to the landing point the optic flow underneath will increase since the image texture is getting closer. If the insect keeps the optic flow constant, then its speed must be reducing as the insect approaches, until the point where the insect is at rest on the landing surface. To measure the performance the optic flow sensor was fixed with a textured visual field passed in front of the sensor.
3.2.4 Silicon Retina [Maha89]
The silicon retina [Maha89] is designed to emulate the initial processing layers of the retina, which include the photoreceptors, horizontal cells, and bipolar cells. An array of 48 x 48 pixels was fabricated using 2.0 µm design rules (width of conducting path) and pixel circuits about 109 x 97 µm in size. A hexagonal resistive grid is used so that local averages of pixels are more highly influenced by the six nearest neighbors than those farther away.
The triad synapse (connecting these three cell types) is modeled in silicon as a follower-connected transconductance amplifier. A capacitor stores the spatial-temporal signal of the photoreceptor, and an amplifier propagates the difference if this signal and the photoreceptor signal, modeling the bipolar cell center-surround antagonistic signal. The photodetector circuit is a bipolar transistor biased with a depletion region responding logarithmically with the incoming light intensity, which corresponds to physiological recordings of natural photoreceptors.
The design was later revised with an adaptive photoreceptor circuit modulated by three feedback paths and individual time constants. The gain of the receptor is adaptive, and the circuit was more robust to transistor mismatches and temperature drifts than the original silicon retina. Another improvement was the incorporation of the edge signal position without the need for off-chip subtraction [Maha91].
3.2.5 Neuromorphic IR analog retina processor [Mass93]
Building on the silicon retina design the Air Force Research Lab (AFRL, Eglin AFB) funded the development of an infrared sensor. One of the problems emulating biological retinae with VLSI technology is the area required to model the time constants observed in biology make the design of a 2D array of pixels unreasonably large. This IR sensor design used switch-capacitor technology with small capacitors to emulate time constants of larger capacitors. Although such technology has no biological counterpart, it was successful in achieving biomimetic spatial-temporal response rates. The drawback of this technology is additional noise caused by the 10KHz switching speeds required for the design.
A 128 x 128 array of Indium Antimonide (InSb) detector elements at 50 µm pitch were connected 4-to-1 to create a 64 x 64 array at 100 µm pitch. This detector plane was bonded to a readout chip where each pixel used the 100 µm pitch area for the switched-capacitor and readout circuitry. The InSb diodes were connected in photovoltaic mode and responded logarithmically as the biological photoreceptors do. The CMOS transistors configured as switched-capacitors were used between pixel nodes to provide the spatial-temporal smoothing inherent in laterally-connected horizontal cell layers of the retina.
The result was a medium-wave IR (MWIR) camera with localized gain control. The camera captured imagery of a gas torch in front of a lamp with a large flood light bulb. Conventional cameras at that time would saturate in all the lighted areas unless a global gain control were in place, in which case the objects in the darker parts of the image would not be seen. In this experiment the filament of the light bulb, the outline of the torch flame, as well as the object in the darker parts of the image could be clearly seen. This is the benefit of localized gain control of natural biological retinae and bio-inspired sensors that model them.
3.2.6 Michaelis-Menten auto-adaptive pixels M2APix [Maf15]
The vision system of primates (and other animals) provides responses over a wide range of luminosities while at the same time provides good sensitivity to local contrast changes, giving the vision system the ability to simultaneously distinguish a bright object against a bright background in one part of the image and a dark object against a dark background in another part of the image. The wide range of luminosities is facilitated by the opening and closing of the iris as well as the natural logarithmic response of the photoreceptors. The good sensitivity is facilitated by the lateral inhibition of the post-photoreceptor processing neurons, the horizontal cells.
Many machine vision designers have sought to develop wide dynamic range sensors and have looked to the natural vision system for inspiration. The Delbruck adaptive pixel [Del94] used the logarithmic photoreceptor circuit of the original silicon retina [Maha88] and is used in comparison with the Michaelis-Menten auto-adaptive pixel (M2APix) proposed here [Maf15].
The Michaelis-Menten equation [Mich1913] was derived to model enzyme kinetics in biochemistry. It describes the rate of enzymatic reactions in terms of the maximum rate achieved when the substrate is saturated and a constant representing the substrate concentration when the reaction rate is half the maximum rate [WikiMM]. It is adapted in [Maf15] to describe the photoreceptor’s response, V, in terms of the maximum response at lamination saturation, Vm, the light intensity, I, and an adaptation parameter, σ, given in [Maf15] as
Substituting V with the enzymatic reaction rate, Vm with the maximum rate when the substrate concentration is saturated, I with the substrate concentration, and σ with the Michaelis constant, which is the substrate concentration when the rate is half Vm, and letting n = 1 this equation reduces to the original biochemistry equation [WikiMM].
The Delbruck adaptive pixel provides a 7-decade range of light adaptation and a 1-decade range of contrast sensitivity. There were some issues raised concerning steady-state responses increasing with light intensity and inconsistent transient responses under large contrast sensitivity. Other methods using resistive grids to emulate horizontal cell networks resulted in 4 decades of sensitivity but required external voltage sources to set bias points [Maf15].
A photoreceptor array of 12 M2APix pixels and 12 Delbruck pixels was fabricated and used for comparison. The 2 x 2 mm silicon retina was fabricated into a 9 x 9 mm package with the two 12-pixel arrays side-by-side for comparison. The experimental results confirmed that the M2APix pixels responded to a 7-decade range of luminosities and with a 2-decade range of contrast sensitivities. The advantage over the Delbruck adaptive pixel is that it produces a more steady contrast response over the 7 decades of luminosities so that the least significant bit (LSB) will be a lower value and therefore a better contrast resolution [Maf15]
3.2.7 Autonomous hovercraft using insect-based optic flow [Van17]
A bio-inspired eye is designed to allow an aerial vehicle passive navigation through corridors and smooth landing by having vision sensors responding to optic flow in the front, two sides, and the bottom. A given example would be a quadrotor exploring a building by keeping a certain distance from the walls.
Called the “OctoM2APix” the 50 gm sensor includes 8 Michaelis-Menten auto-adaptive pixels (M2APix): 3 measuring optic flow (OF) on the left side, 3 measuring OF on the right, and 2 on the bottom measuring OF on the ground underneath the vehicle. The center pixel on each side is measuring OF at right angles to the heading; one is pointing between the side and the front, while the other is pointing between the side and the rear. Each side covers about 92° in the horizontal plane. The object is to allow the vehicle to correct for heading based on the differing OF measurements of the three pixels on either side.
The experimental results were with the OctoM2APix sensor stationary and a textured surface moving next to it at various angles with respect to the (simulated) heading, or the direction of the front of the sensor. The experimental heading included 0°, where the vehicle would be following a wall-like surface, +20°, where the (non-simulated) vehicle would eventually collide with the surface if not corrected, -20°, where the vehicle would be separating from the surface, and -45°, where the vehicle would be separating at a faster rate. The OF on forward and rear side pixels should offset each other when heading is parallel to the surface, and the difference between forward and rear side pixels would provide cues for the heading with respect to the wall, and thus allow the vehicle to adjust its heading if the goal were to follow the wall at a constant rate. The experimental results are shown by calculating heading from the center and forward pixel, the center and rear pixel, and the forward and rear pixel, the latter being the best estimate when all three sensors had the surface in view. This makes sense since this would be the widest separation.
3.2.8 Emulating fovea with multiple regions of interest [Azev19]
There are many applications where image resolution is high in some region of interest (ROI) and low in the remaining portion of the image. This is a crude resemblance of a foveated image but could be argued as bio-inspired by the fovea. In both natural and synthetic designs, the idea is to conserve computational resources by using a higher sampling in an ROI (center gaze for biology) and lower sampling elsewhere. Non-uniform sampling is seen in all natural sensory systems as biology has adapted to the different levels of relevance of natural stimuli (passive or active). In many commercial and military applications multiple ROI’s could be employed, but this is rare in biology if it exists. (Vision systems have a single fovea, but it could be argued that there are multiple regions of higher sampling, for example, in the sense of touch as the well-known somatotopic map would suggest).
A vehicle tracking system designed for self-driving cars uses multiple ROI’s and claims fovea inspiration. The subsequent image processing is developed using deep-learning neural networks, which again implies some level of bio-inspiration. The system uses vehicle wave-points, which are the expected future locations of the vehicles, and continually crops the image looking for other vehicles. This is analogous to drivers looking down the road they are traveling. The experimental results claimed an improvement of long-range car detection from 29.51% to 63.15% overusing a single whole image [Azev19]. As pointed out before, many researchers are more focused on solving engineering problems (as they should be) and not too concerned with the level of biomimicry. Therefore, there can be a chasm between the levels of biomimicry between various efforts claiming bio-inspired designs.
3.2.9 Using biological nonuniform sampling for better virtual realization [Lee18]
There are other applications that are not mimicking biology but considering the high spatial acuity of the fovea. For example, a head-mounted display can consider the gaze direction for visualization of 3D scenes with multiple layers of 2D images. The goal is to improve received image quality and accuracy of focus cues by taking advantage of the loss of spatial acuity in the periphery without the need for tracking the subject’s pupil [Lee18]. Another example is a product (called Foveator) that tracks the motion of the pupil and limits high-resolution rendering only in the direction needed. The intended application is for improved virtual reality (VR) experience [see www.inivation.com]. These ideas leverage natural design information to relax requirements of a visual system to avoid providing more than necessary as opposed to using the design of natural fovea to inspire newer designs.
3.2.10 Asynchronous event-based retinas [Liu15a]
Conventional camera systems have pixelated, digitized, and framed pictures for post spatial, temporal, and chromatic processing. Natural vision systems send asynchronous action potentials (spikes) when the neuronal voltage potential exceeds a threshold, which happens at any time, instead of on the leading edge of a digital clock cycle. The information is thus gathered asynchronously, and these information spikes only occur when there is something to cause them. Mimicking this biological behavior is the emerging asynchronous event-based systems. Progress has been slow due in part to the unfamiliarity of silicon industry with non-clocked (asynchronous) circuitry. The emulation of cell types is limiting as industry is reluctant to reduce the pixel fill areas to make room for additional functionality [Liu15a]. In mammals the light travels through the retinal neuron layers and then through many layers of photopigment, allowing numerous opportunities for photon capture than a single pass (such as the depletion region of a pn junction). The chip real-estate for asynchronous (analog) processing in the retina does not conflict with the photon-capturing photoreceptor as the retinal is transparent to the incoming photonic information.
One asynchronous silicon retina design includes an attempt to mimic the magno- and parvo-cellular pathways (MP and PP) of the optic nerve [Zag04]. The sustained nature of the PP and the transient nature of the MP is pursued which results in both ON and OFF ganglion cells for both MP and PP, which is what is observed in natural vision systems. The benefit is natural contrast adaptation in addition to adaptive spatio-temporal filtering. The resulting localized automatic gain control provides a wide dynamic range and makes available two separate spatial-temporal bandpass filtered representation of the image. One of the challenges of using this vision system is the large non-uniformity between pixel responses [Liu15a]. Gross non-uniformity between receptors and neurons is common in natural sensory systems as the adaptive (plastic) nature of neurons compensates for such non-uniformities.
3.2.11 Emulating retina cells using photonic networks of spiking lasers [Rob20]
Silicon retinas and cochleae such as those in [Liu15] use hard silicon to emulate the behavior of biological neuronal networks that are adaptive and exhibit plasticity. Nevertheless, these bio-inspired designs show promise of the applications of such novel sensory systems. In a similar way vertical cavity surface emitting lasers (VCSELs) are used to emulate responses of certain neurons in the retina and are referred to as VCSEL-neurons. In biology the photonic energy is converted to a graded (or analog) potential by the biochemistry of the photopigments of the photoreceptor cells. By keeping the information photonic the speeds of computations can exceed 7 orders of magnitude improvement. This dramatic improvement in information processing performance has wide applications for computationally-intense algorithm frameworks such as artificial intelligence and deep learning.
This effort demonstrates retinal cell emulation using off-the-shelf VCSEL components operating at conventional telecom wavelengths. The VCSEL-neurons were configured to emulate the spiking behavior of ON and OFF bipolar cells as well as retinal ganglion cells. In these silicon and photonic applications, we see biology as an inspiration for novel information processing strategies but then combine those strategies with available technology that does not emulate the way biology works. A similar example of this concept is also seen when the fovea is emulated in the next few applications.
3.2.12 Integrating insect vision polarization with other vision principles [Giak18]
The visual sensory systems of many species are designed to process environment-provided stimulus that have space, time, and color dimensions. Arthropods and some marine species have been shown to have sensitivities to the polarization of light as well. For example, the octopus retina has tightly packed photoreceptor outer segments composed of microvilli that are at right angles to the microvilli of neighboring photoreceptor cells. The microvilli orientation alternates between these right angles and this is believed to give the octopus sensitivity to polarized light [Smith00].
Aluminum nanowire polarization filters are used in this effort [Giak18] to emulate the microvilli of ommatidium, the components that make up the compound eye. Polarization measurements were made to characterize the polarization of several polymers. A previously designed neuromorphic camera system is used with polarization filters to show improvement in visually recognizing a rotating blade if polarization information is used [Giak18].