Attention conservation notice: 2700+ words elaborating a presentation from a non-technical conference about AI, where the conversation devolved to "blockchain" within an hour; includes unexplained econometric jargon. Life is short, and you should have more self-respect.
I got asked to be a panelist at a November 2017 symposium at the IMF on machine learning, AI and what they can do to/for the work of the Fund and its sister organizations, specifically the work of its economists. What follows is an amplification and rationalization of my actual remarks. It is also a reconstruction, since my notes were on an only-partially-backed-up laptop stolen in the next month. (Roman thieves are perhaps the most dedicated artisans in Italy, plying their trade with gusto on Christmas Eve.) Posted now because reasons.
On the one hand, I don't have any products to sell, or even much of a consulting business to promote, so I feel a little bit out of place. But against that, there aren't many other people who work on machine learning who read macro and development economics for fun, or have actually estimated a DSGE model from data, so I don't feel totally fradulent up here.
We've been asked to talk about AI and machine learning, and how they might impact the work of the Fund and related multi-lateral organizations. I've never worked for the Fund or the World Bank, but I do understand a bit about how you economists work, and it seems to me that there are three important points to make: a point about data, a point about models, and a point about intelligence. The first of these is mostly an opportunity, the second is an opportunity and a clarification, and the third is a clarification and a criticism --- so you can tell I'm an academic by taking the privilege of ending on a note of skepticism and critique, rather than being inspirational.
I said my first point is about data --- in fact, it's about what, a few turns of the hype cycle ago, we'd have called "big data". Economists at the Fund typically rely for data on the output of official statistical agencies from various countries. This is traditional, this sort of reliance on the part of economists actually pre-dates the Bretton Woods organizations, and there are good reasons for it. With a few notable exceptions, those official statistics are prepared very carefully, with a lot of effort going in to making them both precise and accurate, as well as comparable over time and, increasingly, across countries.
But even these official statistics have their issues, for the purposes of the Fund: they are slow, they are noisy, and they don't quite measure what you want them to.
The issue of speed is familiar: they come out annually, maybe quarterly or monthly. This rate is pretty deeply tied to the way the statistics are compiled, which in turn is tied to their accuracy --- at least for the foreseeable future. It would be nice to be faster.
The issue of noise is also very real. Back in 1950, the great economist Oskar Morgenstern, the one who developed game theory with John von Neumann, wrote a classic book called On the Accuracy of Economic Observations, where he found a lot of ingenious ways of checking the accuracy of official statistics, e.g., looking at how badly they violated accounting identities. To summarize very crudely, he concluded that lots of those statistics couldn't possibly be accurate to better than 10%, maybe 5% --- and this was for developed countries with experienced statistical agencies. I'm sure that things are better now --- I'm not aware of anyone exactly repeating his efforts, but it'd be a worthwhile exercise --- maybe the error is down to 1%, but that's still a lot, especially to base policy decisions on.
The issue of measurement is the subtlest one. I'm not just talking about measurement noise now. Instead, it's that the official statistics are often tracking variables which aren't quite what you want1. Your macroeconomic model might, for example, need to know about the quantity of labor available for a certain industry in a certain country. But the theory in that model defines "quantity of labor" in a very particular way. The official statistical agencies, on the other hand, will have their own measurements of "quantity of labor", and none of those need to have exactly the same definitions. So even if we could magically eliminate measurement errors, just plugging the official value for "labor" in to your model isn't right, that's just an approximate, correlated quantity.
So: official statistics, which is what you're used to using, are the highest-quality statistics, but they're also slow, noisy, and imperfectly aligned with your models. There hasn't been much to be done about that for most of the life of the Fund, though, because what was your alternative?
What "big data" can offer is the possibility of a huge number of noisy, imperfect measures. Computer engineers --- the people in hardware and systems and databases, not in machine learning or artificial intelligence --- have been making it very, very cheap and easy to record, store, search and summarize all the little discrete facts about our economic lives, to track individual transactions and aggregate them into new statistics. (Moving so much of our economic lives, along with all the rest of our lives, on to the Internet only makes it easier.) This could, potentially, give you a great many aggregate statistics which tell you, in a lot of detail and at high frequency, about consumption, investment, employment, interest rates, finance, and so on and so forth. There would be lots of noise, but having a great many noisy measurements could give you a lot more information. It's true that basically none of them would be well-aligned with the theoretical variables in macro models, but there are well-established statistical techniques for using lots of imperfect proxies to track a latent, theoretical variable, coming out of factor-analysis and state-space modeling. There have been some efforts already to incorporate multiple imperfect proxies into things like DSGE models.
I don't want get carried away here. The sort of ubiquitous recording I'm talking about is obviously more advanced in richer countries than in poorer ones --- it will work better in, say, South Korea, or even Indonesia, than in Afghanistan. It's also unevenly distributed within national economies. Getting hold of the data, even in summary forms, would require a lot of social engineering on the part of the Fund. The official statistics, slow and imperfect as they are, will always be more reliable and better aligned to your models. But, wearing my statistician hat, my advice to economists here is to get more information, and this is one of the biggest ways you can expand your information set.
The second point is about models --- it's a machine learning point. The dirty secret of the field, and of the current hype, is that 90% of machine learning is a rebranding of nonparametric regression. (I've got appointments in both ML and statistics so I can say these things without hurting my students.) I realize that there are reasons why the overwhelming majority of the time you work with linear regression, but those reasons aren't really about your best economic models and theories. Those reasons are about what has, in the past, been statistically and computationally feasible to estimate and work with. (So they're "economic" reasons in a sense, but about your own economies as researchers, not about economics-as-a-science.) The data will never completely speak for itself, you will always need to bring some assumptions to draw inferences. But it's now possible to make those assumptions vastly weaker, and to let the data say a lot more. Maybe everything will turn out to be nice and linear, but even if that's so, wouldn't it be nice to know that, rather than to just hope?
There is of course a limitation to using more flexible models, which impose fewer assumptions, which is that it makes it easier to "over-fit" the data, to create a really complicated model which basically memorizes every little accident and even error in what it was trained on. It may not, when you examine it, look like it's just memorizing, it may seem to give an "explanation" for every little wiggle. It will, in effect, say things like "oh, sure, normally the central bank raising interest rates would do X, but in this episode it was also liberalizing the capital account, so Y". But the way to guard against this, and to make sure your model, or the person selling you their model, isn't just BS-ing is to check that it can actually predict out-of-sample, on data it didn't get to see during fitting. This sort of cross-validation has become second nature for (honest and competent) machine learning practitioners.
This is also where lots of ML projects die. I think I can mention an effort at a Very Big Data Indeed Company to predict employee satisfaction and turn-over based on e-mail activity, which seemed to work great on the training data, but turned out to be totally useless on the next year's data, so its creators never deployed it. Cross-validation should become second nature for economists, and you should be very suspicious of anyone offering you models who can't tell you about their out-of-sample performance. (If a model can't even predict well under a constant policy, why on Earth would you trust it to predict responses to policy changes?)
Concretely, going forward, organizations like the Fund can begin to use much more flexible modeling forms, rather than just linear models. The technology to estimate them and predict from them quickly now exists. It's true that if you fit a linear regression and a non-parametric regression to the same data set, the linear regression will always have tighter confidence sets, but (as Jeffrey Racine says) that's rapid convergence to a systematically wrong answer. Expanding the range and volume of data used in your economic modeling, what I just called the "big data" point, will help deal with this, and there's a tremendous amount of on-going progress in quickly estimating flexible models on truly enormous data sets. You might need to hire some people with Ph.D.s in statistics or machine learning who also know some economics --- and by coincidence I just so happen to help train such people! --- but it's the right direction to go, to help your policy decisions be dictated by the data and by good economics, and not by what kinds of models were computationally feasible twenty or even sixty years ago.
The third point, the most purely cautionary one, is the artificial intelligence point. This is that almost everything people are calling "AI" these days is just machine learning, which is to say, nonparametric regression. Where we have seen breakthroughs is in the results of applying huge quantities of data to flexible models to do very particular tasks in very particular environments. The systems we get from this are really good at that, but really fragile, in ways that don't mesh well with our intuition about human beings or even other animals. One of the great illustrations of this are what are called "adversarial examples", where you can take an image that a state-of-the-art classifier thinks is, say, a dog, and by tweaking it in tiny ways which are imperceptible to humans, you can make the classifier convinced it's, say, a car. On the other hand, you can distort that picture of a dog into an image something unrecognizable by any person while the classifier is still sure it's a dog.
If we have to talk about our learning machines psychologically, try not to describe them as automating thought or (conscious) intelligence, but rather as automating unconscious perception or reflex action. What's now called "deep learning" used to be called "perceptrons", and it was very much about trying to do the same sort of thing that low-level perception in animals does, extracting features from the environment which work in that environment to make a behaviorally-relevant classification2 or prediction or immediate action. This is the sort of thing we're almost never conscious of in ourselves, but is in fact what a huge amount of our brains are doing. (We know this because we can study how it breaks down in cases of brain damage.) This work is basically inaccessible to consciousness --- though we can get hints of it from visual illusions, and from the occasions where it fails, like the shock of surprise you feel when you put your foot on a step that isn't there. This sort of perception is fast, automatic, and tuned to very, very particular features of the environment.
Our current systems are like this, but even more finely tuned to narrow goals and contexts. This is why they have such alien failure-modes, and why they really don't have the sort of flexibility we're used to from humans or other animals. They generalize to more data from their training environment, but not to new environments. If you take a person who's learned to play chess and give them a 9-by-9 board with an extra rook on each side, they'll struggle but they won't go back to square one; AlphaZero will need to relearn the game from scratch. Similarly for the video-game learners, and just about everything else you'll see written up in the news, or pointed out as a milestone in a conference like this. Rodney Brooks, one of the Revered Elders of artificial intelligence, puts it nicely recently, saying that the performances of these systems give us a very misleading idea of their competences3.
One reason these genuinely-impressive and often-useful performances don't indicate human competences is that these systems work in very alien ways. So far as we can tell4, there's little or nothing in them that corresponds to the kind of explicit, articulate understanding human intelligence achieves through language and conscious thought. There's even very little in them of the un-conscious, in-articulate but abstract, compositional, combinatorial understanding we (and other animals) show in manipulating our environment, in planning, in social interaction, and in the structure of language.
Now, there are traditions of AI research which do take inspiration from human (and animal) psychology (as opposed to a very old caricature of neurology), and try to actually model things like the structure of language, or planning, or having a body which can be moved in particular ways to interact with physical objects. And while these do make progress, it's a hell of a lot slower than the progress in systems which are just doing reflex action. That might change! There could be a great wave of incredible breakthroughs in AI (not ML) just around the corner, to the point where it will make sense to think about robots actually driving shipping trucks coast to coast, and so forth. Right now, not only is really autonomous AI beyond our grasp, we don't even have a good idea of what we're missing.
In the meanwhile, though, lots of people will sell their learning machines as though they were real AI, with human-style competences, and this will lead to a lot of mischief and (perhaps unintentional) fraud, as the machines get deployed in circumstances where their performance just won't be anything like what's intended. I half suspect that the biggest economic consequence of "AI" for the foreseeable future is that companies will be busy re-engineering human systems --- warehouses and factories, but also hospitals, schools and streets --- so to better accommodate their machines.
So, to sum up:
The classic paper on this, by, inter alia, one of the inventors of neural networks, was called "What the frog's eye tells the frog's brain". This showed how, already in the retina, the frog's nervous system picked out small-dark-dots-moving-erratically. In the natural environment, these would usually be flies or other frog-edible insects.^
Distinguishing between "competence" and "performance" in this way goes back, in cognitive science, at least to Noam Chomsky; I don't know whether Uncle Noam originated the distinction.^
The fact that I need a caveat-phrase like this is an indication of just how little we understand why some of our systems work as well as they do, which in turn should be an indication that nobody has any business making predictions about how quickly they'll advance.^