Olshausen Field 2000 Amsci

Embed Size (px)

Citation preview

  • 8/3/2019 Olshausen Field 2000 Amsci

    1/9

    A reprint from

    American Scientistthe magazine of Sigma Xi, The Scientific Research Society

    This reprint is provided for personal and noncommercial use. For any other use, please send a request to Permissions,American Scientist, P.O. Box 13975, Research Triangle Park, NC, 27709, U.S.A., or by electronic mail to [email protected] Xi, The Scientific Research Society and other rightsholders

  • 8/3/2019 Olshausen Field 2000 Amsci

    2/9

    Peer out your window. Unless youare particularly lucky, you mightthink that your daily view has littleaffinity with some of the more spectac-ular scenes you have taken in over theyears: the granite peaks of the highSierra, the white sands and blue waters

    of an unspoiled tropical island or just abeautiful sunset. Strangely, you wouldbe wrong. Most scenes, whether gor-geous or ordinary, display an enor-mous amount of similarity, at least intheir statistical properties. By charac-terizing this regularity, investigatorshave gained important new insightsabout our visual environmentandabout the human brain.

    This advance comes from the effortsof a diverse set of scientistsmathe-maticians, neuroscientists, psycholo-gists, engineers and statisticianswhohave been rigorously attacking theproblem of how images can best be en-coded and transmitted. Some of theseinvestigators are interested in devisingalgorithms to compress digital imagesfor transmission over the airwaves orthrough the Internet. Others (like our-selves) are motivated to learn how theeye and brain process visual informa-tion. This research has led workers tothe remarkable conclusion that nature

    has found solutions that are near to op-timal in efficiently representing imagesof the visual environment. Just as evo-lution has perfected designs for the eyeby making the most of the laws of op-tics, so too has it devised neural circuitsfor vision by obeying the principles of

    efficient coding.To appreciate these feats of naturalengineering, one first needs a basic un-derstanding of what neuroscientistshave learned over the years about thevisual system. Most of what is knowncomes from studies of other animals,primarily cats and monkeys. Althoughthere are differences among variousmammals, there are enough similari-ties that neuroscientists can make somereasonable generalizations about howthe human visual system operates.

    For example, they have known formany decades that the first stage of vis-ual processing takes place within theretina, in a network of nerve cells (neu-rons) that process information comingfrom photoreceptors. The results ofthese mostly analog computations feedinto retinal ganglion cells, which repre-sent the information in digital form(as a train of voltage spikes) and pass itthough long projections that carry sig-nals outward (axons). Bundled togeth-er, these axons form the optic nerve,which exits the eye and makes connec-

    tions with neurons in a region near thecenter of the brain called the lateral geniculate nucleus. These neurons inturn send their outputs to the primaryvisual cortex, an area at the rear of thebrain that is also referred to as V1.

    Neurons situated along this pathwayare usually characterized in terms oftheir receptive fields, which delineatewhere in the visual field light eitherraises or lowers the level of neural ac-tivity. Neurons in the retina and lateralgeniculate nucleus usually have recep-

    tive fields with excitatory and inhibitory

    zones arranged roughly in concentriccircles, whereas neurons in V1 typicallyhave receptive fields with parallelbands of excitation and inhibition. Athigher stages of visual processing, in-volving, for example, the areas knownas V2 and V4, receptive fields become

    progressively more complex; yet char-acterizing what exactly these neural cir-cuits are computing remains elusive.

    Although a vast amount of informa-tion about the inner workings of thevisual system has been gathered overthe years, neuroscientists are still leftwith the question ofwhy nature hasfashioned this neural circuitry specifi-cally in the way that it has. We believethe answer is that the visual system or-ganizes itself to represent efficiently thesorts of images it normally takes in,which we call natural scenes.

    The Uniformity of NatureNatural scenes, as we define them, areimages of the visual environment inwhich the artifacts of civilization donot appear. Thus natural scenes mightshow mountains, trees or rocks, butthey would not include office build-ings, telephone poles or coffee cups.(Although we make this distinction,most of our conclusions apply to artifi-cial environments as well.) The imagesfor our studies come from photographs

    we have taken with conventional cam-eras and digitized with a film scanner.We then calibrate these digitized im-ages to account for the nonlinear as-pects of the photographic process. Af-ter doing so, the pixel values scaledirectly with the intensity of light inthe original scene (Figure 1).

    Why should a diverse set of imagesobtained in this way show any statisti-cal similarity with one another whenthe natural world is so varied? Oneway to get an intuitive feel for the an-

    swer is to consider how images look

    238 American Scientist, Volume 88 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    Bruno A. Olshausen and David J. Field have workedtogether since 1994. Olshausen received his doctor-ate in computation and neural systems in 1994 fromthe California Institute of Technology. In 1996, hejoined the faculty of the Department of Psychologyand the Center for Neuroscience of the University ofCalifornia, Davis. Field received his doctorate inpsychology from the University of Pennsylvania in1984 and worked at the Physiological Laboratory atthe University of Cambridge until 1990. He is cur-rently a professor in the Department of Psychologyof Cornell University. Address for Olshausen: Cen-ter for Neuroscience, University of California,Davis, 1544 Newton Court, Davis, CA 95616. In-ternet: [email protected]

    Vision and the Coding of Natural Images

    The human brain may hold the secrets to the best

    image-compression algorithms

    Bruno A. Olshausen and David J. Field

  • 8/3/2019 Olshausen Field 2000 Amsci

    3/9

    Figure 1. Images of the natural environment, such as this view of a log resting on a stony embankment (top), exhibit a surprising degree of sta-tistical similarity. To investigate these qualities, the authors had first to remove the effects of the photographic process from their images, yield-ing estimates for the actual brightness (luminance) in each pixel. Because luminance spans an enormous rangeit varies from about 100 to40,000 candles per square meter in this imagelinearly scaling these values to the shades that can be printed makes the scene look strangelydim and stark (lower right). Histograms of pixel intensity (yellow panels) show that the distribution of luminance values is short and wide in alight region, whereas it is narrow and peaked in a dark area. Summing the results from the three sample regions (white boxes) produces a dis-tribution skewed toward low values, one that matches the shape of the histogram obtained for the image as a whole.

    2000 MayJune 239 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    midrangeregion

    darkregion

    lightregion

    sum

    allregions

    5,000

    0

    50,0000 luminance(candles per square meter)

    25,000

    0

    5,000

    0

    5,000

    0

    5,000

    n

    n

    n

    n

    n

    0

  • 8/3/2019 Olshausen Field 2000 Amsci

    4/9

    when they are totally disordered (seeFigure 2). We created this rather drab

    image by assigning the intensity ofeach pixel a random value. Thisprocess could have, in theory, pro-duced a stunning image, one that ri-vals any photograph Ansel Adamsever took (just as sitting a monkeydown at a typewriter could, in theory,produceHamlet). But the odds of gen-erating a picture that is even crudelyrecognizable are exceedingly slim, be-cause natural scenes represent just aminiscule fraction of the set of all pos-sible patterns. The question thus be-comes: What statistical properties char-acterize this limited set?

    One simple statistical description ofan image comes from the histogram ofintensities, which shows how manypixels are assigned to each of the possi-ble brightness values. The first thingone discovers in carrying out such ananalysis is that the range of intensity isenormous, varying over eight orders ofmagnitude from images captured on amoonless night to those taken on asunny day. Even within a given scene,the span is usually quite large, typical-

    ly about 600 to one. And in images tak-en on a clear day the range of intensitybetween the deepest shadows and the brightest reflections can easily begreater. But a large dynamic range isnot the only obvious property that nat-ural scenes share. One also finds thatthe form of the histogram is grosslysimilar, usually peaked at the low endwith an exponential fall-off towardhigher intensities.

    Why does this lopsided distributionarise? The best explanation is that it re-

    sults from variations in lighting. Con-

    sider the image shown in Figure 1. Ithas a broad distribution of reflectancesacross the scene but also displays obvi-ous changes in illumination from oneplace to the next. The objects in eachpart of the image might have funda-mentally similar ranges of reflectance,but because some spots are illuminatedmore strongly than others, the pixel

    values in each zone essentially get mul-tiplied by a variable factor. So the in-tensities in a well-illuminated regiontend to show both a higher mean and ahigher variance than those in a poorlylighted area. As a result, the distribu-tion of pixel intensities within a brightportion of the image is short and fat,whereas in a dark one it is tall andskinny. If pixel intensities are averagedover many such regions (or, indeed,over the entire image), one obtains asmooth histogram with the characteris-

    tic peak and fall-off.Such a histogram can be thought ofas a representation of how frequently atypical photoreceptor in the eye experi-ences each of the possible light levels.

    In reality, the situation is more compli-cated, because the eye deals with thisvast dynamic range in a couple of dif-ferent ways. One is that it adjusts theiris, which controls the size of the pupil(and thus the amount of light admittedto the eye) depending on the ambientlight level. In addition, the neurons inthe retina do not directly register light

    intensity. Rather, they encode contrast,which is a measure of the fluctuations inintensity relative to the mean level.

    Given that these neurons respond tocontrast, how would it make the mostsense for them to encode this quantity?Theory dictates that a communicationchannel attains its highest information-carrying capacity when all possible sig-nal levels are used equally often. It iseasy to see why this is so in an extremecase, say where the signal uses only halfof the possible levels. Like a pipe half

    full of water, the information channelwould be carrying only 50 percent of itscapacity. But even if all signal levels areemployed, the full capacity is still notrealized if some of these levels are used

    240 American Scientist, Volume 88 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    Figure 2. White-noise image, created by inde-pendently assigning the intensity of each pix-el a random value, contains no statistical or-der and looks nothing like the natural scenesone is used to seeing.

    n

    output

    -1 +1

    0

    1

    n

    input

    Figure 3. Contrast-response function (red points) for retinal neurons (the so-called large monopo-lar cells) in the eye of a fly displays an S shape. These responses very nearly match the curve(black line) that transforms the distribution of contrasts a fly typically encounters (horizontal yel-low panel) into a flat distribution (vertical yellow panel), accomplishing what specialists in signal

    processing call histogram equalization. (Adapted from Laughlin, 1981.)

  • 8/3/2019 Olshausen Field 2000 Amsci

    5/9

    only rarely. So if maximizing informa-tion throughput is the objective, theneurons encoding contrast should do soin a way that ensures their output levels

    are each used equally often. And thereis indeed evidence that this transforma-tioncalled histogram equalizationgoes on in the eye.

    In the early 1980s, Simon Laughlin,working at the Australian NationalUniversity in Canberra, examined theresponses oflarge monopolar cells in theeyes of flies. These are neurons that re-ceive input directly from photorecep-tors and encode contrast in an analogfashion. He showed that these neuronshave a response function that is wellsuited to produce a uniform distribu-tion of output levels for the range ofcontrasts observed in the natural envi-ronmentor at least in the natural en-vironment of a fly (Figure 3).

    Investigators have found similar re-

    sponse functions for vertebrates aswell. So it would seem that retinal neu-rons somehow know about the statis-tics of their visual environment and

    have arranged their input-output func-tions accordingly. Whether this achieve-ment is an evolutionary adaptation orthe result of an adjustment that contin-ues throughout the lifetime of an or-ganism remains a mystery. But it isclear that these cells are doing some-thing that is statistically sensible.

    Spatial StructureHaving considered a day in the life ofan individual photoreceptor, the nextlogical thing to do is to examine a day inthe life of a neighborhood of photore-ceptors. That is, how does the lightstriking adjacent photoreceptors co-vary? If you look out your window andpoint to any given spot in the scene, it isa good bet that regions nearby havesimilar intensities and colors. Indeed,neighboring pixels in natural imagesgenerally show very strong correlations(Figure 4). They tend to be similar be-cause objects tend to be spatially con-tinuous in their reflectance.

    There are various ways to representthese correlations. One of the most pop-

    ular is to invoke Fourier theory and usethe shape of the spatial-frequency pow-er spectrum. As Fourier showed longago, any signal can be described as asum of sine and cosine waveforms ofdifferent amplitudes and frequencies. Ifthe signal under consideration is an im-age, the sines and cosines become func-tions of space (say, ofx ory), undulatingbetween light and dark as one movesacross the image from left to right andfrom top to bottom.

    When a typical scene is decomposed

    in this way, one finds that the ampli-

    tudes of the Fourier coefficients fall withfrequency,f, by a factor of approximate-ly 1/f (Figure 5). This universal propertyof natural images reflects their scale in-

    variance: As one zooms in or out, thereis always an equivalent amount ofstructure (intensity variation) present.This fractal-like trait is also found inmany other natural signalsheightfluctuations of the Nile River, the wob-bling of the earths axis, the shape ofcoastlines and animal vocalizations, toname just a few examples.

    Given that natural images reliablyexhibit this statistical property, it isquite reasonable to expect that the vis-ual system might take advantage of it.After all, each axon within the opticnerve consumes both volume and en-ergy, so neglecting spatial structureand allowing high correlations amongthe signals carried by these wires

    2000 MayJune 241 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    pixel2

    pixel 1

    spatial frequency(cycles per image)

    rela

    tiveamplitude

    104

    103

    102

    101 10

    210

    310

    0

    Figure 5. Amplitudes of the Fourier compo-nents in natural images (red line) fall withspatial frequency (f) by approximately 1/f(black line). This property is also found formany other natural signals that exhibit a self-

    similar (that is, fractal) character.

    Figure 4. Correlation between two adjacent pixels in natural images is typically quite high, as plotting the brightness value of one against theother reveals (left panel). If the points considered are situated two pixels apart, the correlation is somewhat less obvious (middle panel). If theyare situated four pixels apart, the correlation is weaker still, but it remains easy to discern in a scatter plot (right panel).

    pixel 1

    pixel2

    pixel 1

    pixel2

    Figure 6. Synthetic image that preserves thetwo-point correlations found in naturalscenes appears curiously natural. But thisimage lacks the sharp discontinuities in in-tensity that are so commonly seen at the

    edges of objects.

  • 8/3/2019 Olshausen Field 2000 Amsci

    6/9

    would constitute a poor use of re-sources. Might the neurons within theretina improve efficiency by pre-pro-cessing visual information before itleaves the eye and passes down the op-tic nerve? And if so, what kind of ma-nipulations would make sense?

    Redundancy ReductionThe answer comes from a theory thatHorace Barlow of the University ofCambridge formulated nearly 40 yearsago. He proposed a simple self-organ-izing principle for sensory neuronsnamely that they should arrange thestrengths of their connections so as toencode incoming sensory informationin a manner that maximizes the statis-tical independence of their outputs(hence minimizing redundancy). Bar-

    low reasoned that the underlying caus-es of visual signals are usually inde-pendent entitiesseparate objectsmoving about in the worldand if cer-tain neurons somewhere in the brainare to represent these objects properly,their responses should also be inde-pendent. Thus, by minimizing the re-dundancy inherent in the sensory in-put stream, the nervous system mightbe able to form a representation of theunderlying causes of images, some-thing that would no doubt be useful to

    the organism.

    Many years passed before Barlowstheory was put to work in a quantita-tive fashion to account for the proper-ties of retinal ganglion cells, first byLaughlin and his colleagues in Canber-ra (in the early 1980s) and then adecade later by Joseph Atick, who wasworking at the Institute for AdvancedStudy in Princeton. Atick considered

    the form of correlations that arise innatural imagesnamely, the 1/fam-plitude spectrum. He showed that theoptimal operation for removing thesecorrelations is to attenuate the low spa-tial frequencies and to boost the highones in inverse proportion to theiroriginal amplitudes. The reason isquite simple: A decorrelated image hasa spatial-frequency power spectrumthat is flatthe spatial-frequencyequivalent of white noisewhich isjust what Aticks transformation yields.

    Aticks theory thus explains whyretinal neurons have the particular re-ceptive fields they do: The concentric

    zones of excitation and inhibition es-sentially act as a whitening filter,which serves to decorrelate the outputssent down the optic nerve. The specificform of the receptive fields that Atickstheory predicts nicely matches theproperties of retinal ganglion cells interms of spatial frequency. And recent-ly Yang Dan, now at the University ofCalifornia, Berkeley, showed that At-icks theory also accounts for the tem-poral-frequency response of neurons inthe lateral geniculate nucleus.

    Sparse CodingThe agreement between the theory ofredundancy reduction and the work-ings of nerve cells in the lower levels ofthe visual system is encouraging. Butsuch mechanisms for decorrelation are

    just the tip of the iceberg. After all,there is more to natural images thanthe obvious similarity among pairs ofnearby pixels.

    One way to get a feel for the statisti-cal structure present is to consider whatimages would look like if they could becompletely characterized by two-pointcorrelations among pixels (Figure 6).One of the most obvious ways that nat-ural scenes differ from such images isthat they contain sharp, oriented dis-continuities. Indeed, it is not hard to see

    that most images contain regions of rel-

    atively uniform structure interspersedwith distinct edges, which give rise tounique three-point and higher correla-tions. So one must also consider howneurons might reduce the redundancythat comes about from these higher-or-der forms of structure.

    A natural place to look is the primaryvisual cortex, which has been the focus

    of many studies since the early 1960s,when David Hubel and Torsten Wieselat Harvard University first charted thereceptive fields of these neurons anddiscovered their spatially localized, ori-ented and bandpass properties. Thatis, each neuron in this area responds se-lectively to a discontinuity in luminanceat a particular location, with a specificorientation and containing a limitedrange of spatial frequencies. By the mid-dle of the 1970s, some investigators be-gan modeling these neurons quantita-

    tively and were attempting to representimages with these models.Stjepan Marcelja, a mathematician at

    the Australian National University, no-ticed some of these efforts by neurosci-entists and directed their attention totheories of information processing thatDennis Gabor developed during the1940s. Gabor, a Hungarian-English sci-entist who is most famous for invent-ing holography, showed that the func-tion that is optimal for matchingfeatures in time-varying signals simul-taneously in both time and frequencyis a sinusoid with a Gaussian (bell-shaped) envelope. Marcelja pointedout that such functions, now common-ly known as Gabor functions, describeextremely well the receptive fields ofneurons in the visual cortex (Figure 7).From this work, many neuroscientistsconcluded that the cortex must be at-tempting to represent the structure of

    images in both space and spatial fre-quency. But the Gabor theory still begsthe question of why such a joint space-frequency representation is important.Is it somehow particularly well suitedto the higher-order statistical structureof natural images?

    About 15 years ago, one of us (Field)began probing this question by investi-gating the connection between the high-er-order statistics of natural scenes andthe receptive fields of neurons in the vis-ual cortex. This was a time when the lin-

    ear-systems approach to the visual sys-

    242 American Scientist, Volume 88 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    Figure 7. Receptive fields of neurons in the visual cortex of cats (top) resemble certain two-dimensional Gabor functions (bottom). The neural cir-cuitry of the visual system may adopt such forms of response because they are well suited to encode images efficiently. (After Daugman, 1989.)

    receptive fields

    Gabor functions

    x

    (x,y)

    y

    x

    response

    y

  • 8/3/2019 Olshausen Field 2000 Amsci

    7/9

    tem had garnered considerable populari-ty. Years of research had provided manyinsights into how the visual system re-sponds to simple stimuli (like spots andgratings) but revealed little about howthe brain processes real images.

    At the time, most scientists studyingthe visual system were under the im-pression that natural scenes had little

    statistical structure. And few believedthat it would be useful even to exam-ine the possibility. Fields first efforts todo so using a set of highly varied im-ages (of rocks, trees, rivers and soforth) consistently showed the charac-teristic 1/f spectra, prompting someskeptics to assert that something hadto be wrong with his camera.

    The discovery of such statistical con-sistency in natural scenes promptedField to investigate whether the Gabor-like receptive fields of cortical neurons

    are somehow tailored to match thisstructure. He did this by examininghistograms obtained after filteringthe images with a two-dimensionalGabor functiona task requiring thepixel-by-pixel multiplication of inten-

    sity values in the image with a Gaborfunction defined within a patch just afew pixels wide and tall. These his-tograms tend to show a sharp peak atzero and so-called heavy tails at ei-ther side. The shape differs greatlyfrom the histograms produced afterapplying a random filtering function,which exhibit more of a Gaussian dis-

    tribution (Figure 8), as does Gabor fil-tering a random image (such as the oneshown in Figure 2).

    The sharp peak and heavy tails turnout to be most pronounced when theparticular Gabor filter chosen resem-bles the receptive fields of cortical neu-rons. This finding suggests that theseneurons are, in a sense, tuned to re-spond to certain patterns in naturalscenes, features, such as edges, that aretypical of these images but that never-theless show up relatively rarely. So

    when presented with an image, only asmall number of neurons in the cortexshould be active; the rest will be silent.With such receptive fields, then, thebrain can achieve what neuroscientistscall a sparse representation.

    Although studies of histograms aresuggestive, they leave many questions.Might other filters be capable of repre-senting images even more sparsely, fil-ters that do not at all resemble the re-ceptive fields of cortical neurons? Andis the brain achieving a sparse repre-sentation by encoding just a few fea-tures and ignoring others? We began

    to tackle these questions in 1994. Atthat time, Olshausen had just complet-ed his doctoral thesis on computation-al models for recognizing objects andwas becoming intrigued by Fieldswork on natural images. Together webegan developing a way to search forfunctions that can represent natural im-ages as sparsely as possible while pre-serving all the information present.

    Because this task turns out to becomputationally difficult, we limitedthe scope of our study to small patches

    (typically 12 by 12 pixels in size) ex-tracted from a set of much larger (512by 512) natural images. The algorithmbegins with a random set ofbasis func-tions (functions that can be added to-gether to construct more complicated

    2000 MayJune 243 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    filteredpixelva

    lue

    73 139 209 75

    151 200

    105 181 186

    0.14 =0.37 =

    0.14 =0.37 =

    1.00 =0.37 =

    0.14 =0.37 =

    0.14 =

    10.251.4

    29.327.8151.074.0

    14.767.0

    26.0

    total = 87.4

    0 positivenegative

    co

    unt

    filtered value

    Gabor

    random

    Figure 8. Filtering an image requires the multiplication of pixel intensities from a small patch with corresponding values for the chosen filter-ing function. The sum of the products will be small if the function does not match the pattern in this portion of the image, whereas it will belarge (positive or negative) if the similarity is great. Performing these operations with the specified function in all possible positions and re-placing the central intensity value with the sum yields a filtered image of positive and negative values. The distribution of pixel values forsuch a filtered image (histograms at upper left) will reflect how well the filtering function matches features in the original scene. For example,a random function will produce a Gaussian histogram, whereas an appropriate two-dimensional Gabor function will produce a histogram with

    a sharp peak at zero and so-called heavy tails to either side.

  • 8/3/2019 Olshausen Field 2000 Amsci

    8/9

    ones) that are the same size as the im-age patches under consideration. Itthen adjusts these functions incremen-tally as many thousands of patches arepresented to it, so that on average itcan reconstruct each image using thesmallest possible number of functions.In other words, the algorithm seeks avocabulary of basis functions suchthat only a small number of wordsare typically needed to describe a giv-en image, even though the set fromwhich these words are drawn might bemuch larger. Importantly, the set of ba-sis functions as a whole had to be ca-

    pable of reconstructing any given im-age in the training set.

    As we hoped from the outset, the ba-sis functions that emerged from thisprocess resemble the receptive fields ofV1 cortical neurons: They are spatiallylocalized, oriented and bandpass (Fig-ure 9). The fact that such functions re-sult without our imposing any otherconstraints or assumptions suggeststhat neurons in V1 are also configuredto represent natural scenes in terms of asparse code. Further support for this no-

    tion has come very recently from the

    work of Jack Gallant and his colleaguesat the University of California, Berkeley,who showed that neurons in the prima-ry visual cortex of monkeys do, in fact,rarely become active in response to thefeatures in natural images.

    Our results also shed new light onthe utility ofwavelets, a popular tool forcompressing digital images, becauseour basis functions bear a close resem- blance to the functions of certainwavelet transforms. In fact, we haveshown that the basis functions our itera-tive procedure provides would allowdigital images to be encoded into fewer

    bits per pixel than is typical for the bestschemes now usedfor example, JPEG2000 (a wavelet-based image-compression standard now under de-velopment). Together with MichaelLewicki at Carnegie Mellon University,we are currently exploring whetherthis work might yield practical bene-fits for computer users and others whoneed to store and transmit digital im-ages efficiently.

    Independence: The Holy Grail?

    Our algorithm for finding sparse image

    codes is one of a broad class of compu-tational techniques known as indepen-dent-components analysis. These methodshave drawn considerable attention be-cause they offer the means to reveal thestructure hidden in many sorts of com-plex signals. Independent-componentsanalysis was originally conceived as away to identify multiple independent

    sources when their signals are blendedtogether, and it has been quite success-ful at solving such problems. But whenapplied to image analysis, the resultsobtained should not really be deemedindependent components.

    Typical images are not simply thesum of light rays coming from differentobjects. Rather, images are complicatedby the effects of occlusion and by varia-tions in appearance that arise fromchanges in illumination and viewpoint.What is more, there are often loose cor-

    relations between features within a sin-gle object (say, the parts of a face) andbetween separate objects (chairs, for ex-ample, often appear near tables), and in-dependent-components analysis woulderroneously consider such objects to beindependent entities. So the most onecan hope to achieve with this strategy isto find descriptive functions that are asstatistically independent as possible.But it is quite unlikely that such func-tions will be truly independent.

    Despite these limitations, this generalapproach has yielded impressive re-sults. In a recent study of moving im-ages, Hans van Hateren at the Uni-versity of Grningen obtained a set offunctions that look similar to our so-lutions in their spatial properties butthat shift with time. These functionsare indeed quite similar to the space-time receptive fields of the neurons inV1 that respond to movement in a par-ticular direction.

    Future DirectionsMany other investigators are now at-

    tempting to formulate schemes for en-coding more complex aspects of shape,color and motion, ones that could helpto elucidate the still-puzzling workingsof neurons in V1 and beyond. We sus-pect that this research will eventuallyreveal that higher levels of the visualsystem obey the principles of efficientencoding, just as the low-level neuralcircuits do. If so, then computer scien-tists and engineers now focusing onthe problem of image compressionshould keep abreast of emerging re-

    sults in neuroscience. At the same time,

    244 American Scientist, Volume 88 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].

    Figure 9. Optimal basis functions the authors determined with their iterative algorithm can en-code any image that is the size of each patch (12 by 12 pixels). These empirical functions appearsimilar to the Gabor-like receptive fields of cortical cells (Figure 7), suggesting that the brain en-codes visual information using the smallest number of active neurons possible.

  • 8/3/2019 Olshausen Field 2000 Amsci

    9/9

    neuroscientists should pay close atten-tion to current studies of image pro-cessing and image statistics.

    Some day, scientists may be able tobuild machines that rival peoples abili-ty to search through complex scenesand quickly recognize objectsfromobscure plant species to never-before-seen views of someones face. Such feats

    would be truly remarkable. But moreremarkable still is that the principlesused to design these futuristic devicesmay mimic those of the human brain.

    BibliographyAtick J. J. 1992. Could information theory pro-

    vide an ecological theory of sensory process-ing? Network 3:213251.

    Barlow, H. B. 1989. Unsupervised learning,Neural Computation 1:295311.

    Dan Y., J. J. Atick and R. C. Reid. 1996. Efficientcoding of natural scenes in the lateral genic-ulate nucleus: experimental test of a compu-

    tational theory. Journal of Neuroscience16:33513362.

    Daugman, J. G. 1989. Entropy reduction anddecorrelation in visual coding by orientedneural receptive fields. IEEE Transactions onBiomedical Engineering 36:107114.

    Dong, D. W., and J. J. Attick. 1995. Statistics ofnatural time-varying images. Network6:345358.

    Field, D. J. 1987. Relations between the statisticsof natural images and the response proper-ties of cortical cells.Journal of the Optical Soci-ety of America, A, 4:23792394.

    Field, D. J. 1994. What is the goal of sensory cod-ing? Neural Computation 6:559601.

    Hubel, D. H., and T. N. Wiesel. 1968. Receptivefields and functional architecture of monkeystriate cortex. The Journal of Physiology195:215244.

    Laughlin, S. B. 1981. A simple coding procedureenhances a neurons information capacity.Zeitschrift fr Naturforschung 36: 910912.

    Lewicki, M. S., and B. A. Olshausen. 1999. Aprobabilistic framework for the adaptationand comparison of image codes.Journal of theOptical Society of America, A, 16:15871601.

    Marcelja, S. 1980. Mathematical description ofthe responses of simple cortical cells.Journalof the Optical Society of America, 70:12971300.

    Olshausen, B. A., and D. J. Field. 1997. Sparse cod-ing with an overcomplete basis set: A strategyemployed by V1? Vision Research 37:33113325.

    Olshausen, B. A., and D. J. Field. 1996. Emer-gence of simple-cell receptive field proper-ties by learning a sparse code for natural im-ages. Nature 381:607609.

    Srinivasan, M. V., S. B. Laughlin and A. Dubs.1982. Predictive coding: a fresh view of inhi-bition in the retina. Proceedings of the Royal So-ciety of London, Series B, 216: 427459.

    van Hateren, J. H., and D. L. Ruderman. 1998.Independent component analysis of naturalimage sequences yields spatio-temporal fil-

    ters similar to simple cells in primary visualcortex. Proceedings of the Royal Society of Lon-don, Series B 265:231520.

    Vinje, W. E., and J. L. Gallant. 2000. Sparse cod-ing and decorrelation in primary visual cor-tex during natural vision. Science 287:12731276.

    2000 MayJune 245 2000 Sigma Xi, The Scientific Research Society. Reproduction

    with permission only. Contact [email protected].