Neural Nets, Connectionism, Perceptrons, etc.

11 Apr 2022 12:13

Old notes from c. 2000

I'm mostly interested in them as a means of machine learning or statistical inference. I am particularly interested in their role as models of dynamical systems (via recurrent nets, generally), and as models of transduction.

I need to understand better how the analogy to spin glasses works, but then, I need to understand spin glasses better too.

The arguments that connectionist models are superior, for purposes of cognitive science, to more "symbolic" ones I find unconvincing. (Saying that they're more biologically realistic is like saying that cars are better models of animal locomotion than bicycles, because cars have four appendages in contact with the ground and not two.) This is not to say, of course, that some connectionist models of cognition aren't interesting, insightful and valid; but the same is true of many symbolic models, and there seems no compelling reason for abandoning the latter in favor of the former. (For more on this point, see Gary Marcus.) --- Of course a cognitive model which cannot be implemented in real brains must be rejected; connecting neurobiology to cognition can hardly be too ardently desired. The point is that the elements in connectionist models called "neurons" bear only the sketchiest resemblance to the real thing, and neural nets are no more than caricatures of real neuronal circuits. Sometimes sketchy resemblances and caricatures are enough to help us learn, which is why Hebb, McCulloch and Neural Computation are important for both connectionism and neurobiology.

Reflections circa 2016

I first learned about neural networks as an undergraduate in the early 1990s, when, judging by the press, Geoff Hinton and his students were going to take over the world. (In "Introduction to Cognitive Science" at Berkeley, we trained a three-layer perceptron to classify characters as "Sharks" or "Jets" using back-propagation; I had no idea what those labels meant because I'd never seen West Side Story.) I then lived through neural nets virtually disappearing from the proceedings of Neural Information Processing Systems, and felt myself very retro for including neural nets the first time I taught data mining in 2006. (I dropped them by 2009.) The recent revival, as "deep learning", is a bit weird for me, especially since none of the public rhetoric has changed. The most interesting thing scientifically about the new wave is that it's lead to the discovery of adversarial examples, which I think we still don't understand very well at all. The most interesting thing meta-scientifically is how much the new wave of excitement about neural networks seems to be accompanied by forgetting earlier results, techniques, and baselines.

Reflections in 2022

I would now actually say there are three scientifically interesting phenomena revealed by the current wave of interest in neural networks:

  1. Adversarial examples (as revealed by Szegedy et al.), and the converse phenomenon of extremely high confidence classification of nonsense images that have no humanly-perceptible resemblance to the class (e.g., Nguyen et al.);
  2. The ability to generalize to new instances by using humanly-irrelevant features like pixels at the edges of images (e.g., Carter et al.);
  3. The ability to generalize to new instances despite having the capacity to memorize random training data (e.g., Zhang et al.).

It's not at all clear how specific any of these are to neural networks. (See, Belkin's wonderful "Fit without Fear" for a status report on our progress in understanding my item (3) using other models, going back all the way to margin-based understandings of boosting.) It's also not clear how they inter-relate. But they are all clearly extremely important phenomena in machine learning which we do not yet understand, and really, really ought to understand.

I'd add that I still think there has been a remarkable regression of understanding of the past of our field and some hard-won lessons. When I hear people conflating "attention" in neural networks with attention in animals, I start muttering about "wishful mnemonics", and "did Drew McDermott live and fight in vain?" Similarly, when I hear graduate students, and even young professors, explaining that Mikolov et al. 2013 invented the idea of representing words by embedding them in a vector space, with proximity in the space tracking patterns of co-occurrence, as though latent semantic indexing (for instance) didn't date from the 1980s, I get kind of indignant. (Maybe the new embedding methods are better for your particular application than Good Old Fashioned Principal Components, or even than kernelized PCA, but argue that, dammit.)

I am quite prepared to believe that part of my reaction here is sour grapes, since deep learning swept all before it right around the time I got tenure, and I am now too inflexible to really jump on the bandwagon.

That is my opinion; and it is further my opinion that you kids should get off of my lawn.