The WSJ did a comparison test between the major AI platforms in which Perplexity came in first and Microsoft’s Copilot last. ChatGPT, the current version, came in second, suggesting Copilot should improve soon. But overall, the most common platforms, particularly Microsoft’s, sucked badly. It provided either wrong answers or unhelpful ones. This issue is often called “overpromising and underdelivering,” and it can have a significantly adverse impact on the company’s brand and image by changing it from something you trust to something you don’t.
IBM ran into this problem head-on in the late 1980s which resulted in the termination of then CEO John Akers, who ironically was the most qualified CEO IBM had for some time. But he had been contained by staff who curated everything he saw, and he was unfortunately blindsided when the wheels came off the IBM bus.
Let us talk about the problem with AI generally and specifically the problems of overpromising and underdelivering.
Google Glass
The problem with Microsoft’s Copilot is that it is not ready for real-time. This was also the problem with Google Glass. When Google Glass came out, Google marketed it as if it were a finished offering when it was really more of a beta product that people paid full price for. The problems that resulted were because of the fact the thing was not cooked and because people were not ready to deal with people who had a camera recording everything the user saw. There were some nasty privacy issues and bad reactions from people with poor anger management.
This not only resulted in very unsatisfied users, but it also created a substantial trust problem between them and Google. Early adopters, when the product is successful, turn into advocates, and in this era of social media, advocates are immensely powerful. But if you piss off these people, they become far more powerful critics. The result was that not only did Google Glass die, Google exited the segment. I expect that many of these potential advocates are still doing damage to Google’s brand every chance they get even though their experience was over a decade ago.
Copilot Is not Cooked
Copilot is a bit of a mess now, and that is very problematic when we talk about people using the tool to code. When I used to code, I hated checking my work, and I hated checking someone else’s work even more. The reason why is that people often do not comment in their code, making it far more difficult to determine what they intended to do, let alone figure out if they did it the right way. Writing code is fun. Checking code not so much. Ideally Copilot should have been developed to first assure the quality of code written by others. Only when its accuracy could be confirmed should it have pivoted to both writing and independently checking the code that was written. This way, quality would have been assured first before the product was used to increase speed.
Whether we are talking writing code or launching a space shuttle, when quality takes second place to speed (which is a very common mistake) bad things happen. When we are talking about a tool that can write code at computer speed, the sheer number of problems created could end up causing a QC process, particularly one done by a human, to fail to identify a critical problem that could result in a catastrophic outcome.
As a result, the release of what feels like an early Beta product is resulting in people not trusting Copilot and not trusting Microsoft overall. Microsoft overpromised what its tool can do, and then underdelivered against that promise. Now, given how comparatively well the later version of ChatGPT did, I expect Copilot will significantly improve as soon as these core benefits transfer from OpenAI into Microsoft, particularly since OpenAI has significantly recently improved its own QC capabilities. (This was after its unfortunate decision to terminate its long-term risk team).
Wrapping Up:
The problem is not just Copilot. The industry, with IBM as a significant exception, has excessively focused on productivity and has not focused enough on assuring quality. The result has been an increasingly poor customer experience which has undoubtedly been damaging not only Microsoft’s brand but faith in AI in general.
This highlights the problem we should be focusing AI on, which is decision support, because a far too common problem is the bad decision to release products before they are ready. This is a case in point because by the end of the year, the product will be ready for release. Had Microsoft simply released this as beta and focused on those used to beta product use, the brand damage would have been reduced, and it would be far less likely that people would reject AI.
Quality and decision support should be the initial focus of AI, not productivity. That way, we can keep bad decisions and low-quality AIs from proliferating at unmanageable levels. And this again highlights the recurring problem of overpromising and underdelivering.