Is Google’s Chatbot Sentient? “Logical” Reasons to Disagree

File:Chatbot.jpg - Wikimedia Commons
“Chatbot” from Wikimedia Commons

I have some strong reasons why I think it’s useful to weigh in on the recent drama around a Google Ethics Engineer’s declaration of the LaMBDA product having reached sentience. Slate has a great article about this that I think may bring you up to speed if you’re interested.

Slate’s position on why the assignment of the sentient label to the chatbot was misguided revolves around LaMDA’s complete reliance on human inputs and foundational language models. My assessment extends their position in a different direction and I’ll explain why.

Deductive Logic

All of these foundational models rely on two different types of logic that are common to the software community. The most common is called deductive logic and describes the process where the software compares the truth (or lack) of multiple assertions to determine actions to take. This is a pretty high level explanation (forgive me) which summarizes a significant body of research and work in deduction, but in general, deduction describes the application of rules and logic.

Inductive Logic

Inductive logic is useful for drawing inferences or conclusions from historical observations. If you draw eight black marbles out of the bag in your first eight tries, you might infer from this data that the bag is full of black marbles. Induction has recently experienced a resurgence in software due to the recent interest in deep learning and other machine learning techniques. Machine learning is a form of inductive logic where historical data are trained into models which allow the machine to infer likely outcomes due to current sensed parameters. So in the example of my weather data systems, I have sampled parameters like temperature, pressure, humidity, luminosity, etc., for years. I have also captured when rain occurs (easy to do in Tucson… it doesn’t happen much) and can then label every example of weather data with “rain” or “not-rain”. THEN, if I want to predict whether it will rain at some time in the near future, I conduct inference into my trained weather-rain model using the current values of the weather sensors. This is a very simple description of how machine learning works.

Combinations of Logic

Much current software relies on combinations of traditional deductive logic that makes decisions on when to incorporate inductive logic inferences in order to solve problems most effectively. I always imagine traditional software logic that is evaluating and connecting hundreds and hundreds of small trained machine learning models. This is an example of the combination of these two kinds of logic. This, very simply stated, is what the large foundational models like LaMDA and GPT-3 are doing. The difference is that they are generally using deductive logic rules and VERY LARGE trained models. Most of these foundational models are so large and computationally expensive that most normal people don’t have much ability to use them in any other format than toy applications provided by Google or OpenAI. The very large body of language used by these foundational models allows them to do incredible inference based off of language created by real humans. All the text in Wikipedia is an example of some of the language used to train these models. Inferencing these models using questions from humans (such as the Google employee) can yield surprising, even spooky results. Deductive logic rules can eliminate ridiculous or meaningless responses.

What’s missing? (Who knows what lies in the heart of a machine?)

Despite the fact that these foundational models can be VERY useful, they’re missing something major that prevents them from truly understanding language. How can I say this with confidence?

Abductive Logic is What’s Missing!

It is easy to point out that machines do not (and will not in the near future) have the capacity for abductive logic. Abduction describes an ability that humans have to make an observation Q and conclude that some general principle P must be the reason that Q is true. Notice that this is quite different than deduction and induction. The complexity of the various principles in the world makes abduction very difficult to perform. Sherlock Holmes was a renowned expert in using abduction when he would see, for instance, a wedding ring that was more shiny on the inside than the outside and make the conclusion with no further information that if a person removed the ring frequently it might have that appearance. Machines are not able to make these kinds of intuitive “leaps”. Our current, modern view states that science itself is an example of abduction. We seek hidden principles or causes that we wish to use to actually deduce the observable facts, “Frequently removing a ring might explain why it is shiny and clean on the inside but not the outside”.

There is plenty of research out there telling us that machines can not perform abductive logic. Part of the reason is that in abduction, a likely hypothesis needs to be inferred from a nearly infinite set of explanations. Something in the human brain protects us from getting locked in the infinite loop required to evaluate all these explanations. It is likely to be some mashup of intuition and mental models of rules and value systems that we use to jump to the most likely causes to explain the data. To go deeper, Mindmatters has a great discussion of all these concepts here. They also have a three part series on “The Flawed Logic Behind Thinking Computers”. Part1, Part2, and Part3. There are many more articles out there that explain this gap of machine intelligence including this one from VentureBeat.

Abduction and Natural Language

There is a growing body of work that indicates that abductive reasoning is part of the reason why humans can understand language (Neurips Proceedings link). Some of this is due to the need to interpret to decode errors in language. A famous example comes from Don Quixote where Sancho Panza, Don Quixote’s assistant says: “Senor, I have educed my wife to let me go with your worship wherever you choose to take me.” Don Quixote, immediately identifying the improper usage replies, “INDUCED, you would say, Sancho. Not EDUCED.” By our definition of abduction, we can see that here, Don Quixote uses abductive logic when he adopts the hypothesis that “induced” is the intended word given the context and the similarity between the two words. According to Donald Davidson, This kind of abductive interpretation can occur in natural language understanding when:

  1. Applying a hypothesis to understand new names or labels
  2. Revising prior beliefs or interpretations about particular phrases
  3. Altering interpretations of predicates or other grammatical constructs to fit the context

Conclusion

In the light of the growing numbers of applications of Machine Learning, there has been much more discussion of deductive and inductive reasoning than there was even ten years ago. It’s likely you’ve seen some of this.

It does appear, however, that the understanding of abductive logic is lagging. Though there have been efforts to simulate machine abduction, it has still yet to have been accomplished and for legitimate processing tractability reasons is likely not to be accomplished on traditional (not quantum) computing. This severely limits a machine from true natural language understanding, which would be needed by any sentient being to understand language and communicate. This would also apply to chatbots and describes why they are just examples of the Chinese Room (or a human-language-speaking parrot), neither of which demonstrate understanding of the languages emanating from them.

Organizing for AI&ML Success – from Conway’s Law to the CDAO

Here’s a topic that I have given a great deal of thought to after observing lots of examples of how companies organize to identify, sense, collect, and use their business data. In a nutshell, HOW a company chooses to organize their data strategy and teams determines how successful they will be in delivering business value through data. Why is this? Conway’s Law gives us the reasons…

Conway’s Law

In short, in 1967, Melvin Conway, a computer programmer proposed that organizations design systems that mirror their own communication structure. This sounds very simple, but I’ll give some examples of why this provides really great insight into the power of architecting organizations around desired business outcomes.

First, why does this make sense?

Conway suggested that the architecture of products by organizations who are broken into functional competencies will tend to reflect those functions. For instance, an application developed by a firm with four functions: mechanical engineering, electrical engineering, software engineering, and signal processing will develop applications with distinct modular capabilities that reflect those functions. A module that manages thermal loads, center of gravity, control systems, structural sensing, and power will emerge and be developed by the mechanical engineering group. This module will interface to another module that contains embedded processing and memory through interfaces that carry power and sensors that provide data. This second module, of course, will be developed by the electrical engineering team. The software engineering team will develop a module that will be loaded into the electrical engineering’s processing system through some programming interface and will receive signals from sensors as well as elements within the mechanical engineering modules and will use logic to make decisions. The signal processing team will also develop a module that will be triggered by signals from the software engineering module and will provide outputs that interface with control modules in the mechanical engineering module. Phew! See below for a very high level visualization of how this might occur. Note how each department “owns” their own content and then someone (hopefully a systems architect or systems engineer) manages the interfaces.

Very high-level block diagram demonstrating Conway’s Law – Tod Newman, 2022

Conway’s Law and Data Science / AI&ML

I have seen Conway’s Law borne out over and over with regards to Data Strategy in an organization. Organization one (lets say Mechanical Engineering) understands their business function well and is intent to optimize for this function. They develop a strategy around data collection, storage, and analysis that helps them achieve their goals. Organization two (Finance, we’ll say) does the same thing. Then Organization three follows suit, and so on. Eventually what we have is 10-15 different data silos, each of which works relatively well for the owner (but each of which requires attention and sustainment — something that’s not always present). However, in traditional organizations (companies not named Uber or Google or SpaceX or similar) there is rarely a central figure like the systems architect who designed the complete business data system and who manages the interfaces. Therefore, Conway’s law results in the isolation of multiple, locally-valuable data sources. Frequently because these organizations design their data strategy to their own unique needs, there’s not even a clear way to connect these data stores!

Are there Solutions?

There are lots of examples of companies who have avoided the bulk of this negative effect by designing a centralized data strategy up front. As I alluded earlier, these companies are often data firms that offer a service like Google or Uber. They were born as data companies and developed from the ground up. If you’re not lucky enough to be a company that was born a data firm, however, there may be some possibilities, but I think they might be difficult and involve culture change management.

  1. Centralize the Data Strategy and Empower an Owner: This role has traditionally been called the Chief Data Officer and these days I’m noticing a positive trend towards redefining this role as the Chief Data and Analytics (or AI) Officer. Here’s a good explanation of the difference. This will have the effect of making the statement to the organization that data is now seen as a central business asset vs. simply a local asset. As the Harvard Business Review states, the trend towards naming CDO’s or CDAO’s “reflects a recognition that data is an important business asset that is worthy of management by a senior executive” and that it is “also an acknowledgement that data and technology are not the same and need different management approaches.” Note that redefining and centralizing the organization can leverage the positive aspects of Conway’s Law towards the goal of integrated, aligned data sources.
  2. Identify “low-hanging fruit” in your existing data silos for integration. You may be lucky and have a common key (employee number, part number, etc.) between two data silos that enables the data to be joined. This assumes that you can get permission to see the data by the silo owner, however, which might be a large assumption. Regardless, a demonstration of the power of integrated data could make the case for the difficult decisions and culture shifts (from local to collective ownership of data).
  3. Make a mandate. Jeff Bezos (legendarily) made his API Mandate at Amazon which required all data and functionality to be exposed publicly across Amazon through a defined interface called an Application Programming Interface (API). This interface managed both access to the data as well as insight into the structure of the data. It is said that this mandate changed the company and enabled their future high-value Amazon Web Services business.

Conclusion

If you’ve made it this far, then you probably have the gist of my argument. If you’ve skipped to the conclusion, here’s what I’d want you to know:

  1. Organizations the build a Data Strategy from scratch will fall into the Conway’s Law trap and are unlikely to have the ability to understand data interfaces.
  2. Conversely, a carefully-architected Data Strategy (everything from design of information to be sensed, sensing approach, collection, and application of Data Science, etc.) can be a surprisingly powerful lever for gaining business value. Some of the largest return on internal investments in process improvement I’m aware of inside large firms involve joining previously-unconnected data sources and gaining a new valuable insight for decisions, risk management, or even better understanding of the flow of business value from suppliers to the hands of the customer.
  3. It is hard to apply a new Data Strategy to an existing business culture. Unless you are leading an amazing business culture, it will require change management techniques (like John Kotter’s 8 steps) to succeed.
  4. An empowered role like the CDO, or better, the CDAO, may help this culture change and can make the kinds of “Bezos API mandates” that might be needed can aid success. It can also help with the next challenge, Sustaining the Data Business.

From Documents to Knowledge – Simple Ways of Building and Questioning Knowledge Graphs

Here’s an applied approach to the hard problem of what is referred to as “knowledge representation“, where we provide structures for machines to capture information from the world and represent it as knowledge that can be used to solve problems. There’s a long history of research into this challenging field and much of that research has failed to result in simple, approachable methods.

As someone who thinks hard about building intelligent assistants that enable more effective human decisions (rather than intelligent agents that make their own decisions), I have spent time and energy to approach the knowledge representation problem from this context. This means I work to build systems that can extract and build knowledge from sets of texts and documents that humans will never be able to read through. This system can then provide the human decision maker information visualized in a simpler way that will then improve their decisions.

Example

Context: I was looking for a set of documents to demonstrate my techniques on around the time the Ukraine war was about to begin. As it turned out there had been numerous reports and analyses developed anywhere from 6 months before the war begin right up to the days before the war started.

Goal: Determine AFTER the war began if there was anything in the early analyses that predicted what was going to happen.

Outcome: As you’ll see, interesting predictions could be distilled out of “questions” presented to the knowledge graph.

Process:

  1. One of the hard problems with this kind of analysis is puling data out of the texts that one finds scattered across the internet. I tend to use search engines to find files that I download and then process in bulk. Generally these documents are in PDF formats, which generally makes them a bit harder to process. Automating accurate processing of PDF files is beyond this scope, but it’s a bridge that probably must be crossed for someone interested in Natural Language Processing and Knowledge Representation.
  2. Knowledge Graphs: Building a knowledge graph isn’t nearly as difficult as it sounds, but it requires a few things. A toolkit like the python Natural Language Toolkit (nltk) is very useful, as it has the necessary ingredients like sentence tokenization, word tokenization, and parts of speech classification. Here’s a great overview from a notebook on Kaggle, the data science competition site. One first will use all the downloaded texts to build a “master” knowledge graph, that consists of Subject->Action->Object “triplets” built into a network graph. This graph will be incredibly dense, but what will emerge are central concepts that are frequently noted in the texts.
  3. “Questioning” the Knowledge Graphs: This may also be viewed as filtering central topics out of the master knowledge graph by asking questions of the graph. For instance, the question, “Will Russia’s invasion trigger an economic impact and increased immigration” provides a filtered view of the master knowledge graph that looks like the below:
Ukraine War Knowledge Graph filtered by “Economic Impact” and “Immigration

If you look closely, you will notice that the nodes (blue) are nouns and an arrow points from the subject to the object. The arrow is referred to as the “edge” and it is labeled with the action verb in red. This is interesting and makes nice pictures in presentations and papers, but it becomes useful when the graph is converted into a table of triplets from the filtered graph that point to the context from the document where the triplet was extracted. At this point, the researcher finds the sentences that generate the “answers” to the question. See example of this context below.

Context sentences corresponding with Knowledge Graph search for “Economic and Immigration Impact of Ukraine Conflict”

As you can see, there were multiple discussions of immigration and economic challenges in the set of documents and the “answers” to the question found in these documents are captured in the table (Note: I’m just showing the first few rows of the answers). If one wanted to conduct a very thorough literature search of a much larger set of documents, it is likely that this method could save countless hours of digging through documents and enable quicker and better decisions on the subject.

The Hidden Information that can be Extracted from Texts

The above title might sound boring, but it’s probably the area of “Artificial Intelligence” that I’m most excited about. If you squint just right, you can probably understand what I mean when I say that Texts are generated by Topics filtered through the human mind. What if we could uncover some of those Topics by processing the Texts in some way? What if we could also uncover information about the human mind that is evaluating those Topics?

Here’s a great link that gives some insights into the questions above in bold. It comes from Radiolab (NPR) and presents a couple of really interesting examples:

Agatha Christie.png
https://commons.wikimedia.org/wiki/Category:Agatha_Christie#/media/File:Agatha_Christie.png
  1. The story of Agatha Christie’s novels. A researcher conducted word frequency analysis (probably something like the tf-idf technique) and found that Agatha’s first 72 novels had very similar statistics around word choice and vocabulary. But the 73rd novel showed a huge shift that revealed something that was likely happening in Agatha Christie’s brain. This has a lot of interesting implications! Click the link above to listen to or read the story on Radiolab.
  2. A similar story about an amazing study done by the University of Minnesota on something over 600 nuns over the years that involved memory capacity and status assessments. At some point the dataset of “entrance essays” from each of the sisters was discovered and the researchers learned that information and grammar in the essays — written in the sisters’ youth — had correlation to memory issues in their older ages. This is correlation, not causality, of course, but still fascinating.

These are the kinds of analyses that I like to do on all sorts of text. There are techniques that allow me to uncover hidden (or “latent”) topics from large quantities of text and sometimes what these reveal is spooky! There are all sorts of other kinds of analyses like the word frequency ones from the Agatha Christie study as well as from the grammar and idea density metrics used in the University of Minnesota study. This may well be one of the most useful near-term applications of AI that we have today, one that is even able to reveal hidden truths about our own selves.


Transforming into a Resilient Digital Business Requires a Data Strategy

During the COVID outbreak, I have written extensively about the impact of the pandemic on regions and individuals. One of the unsurprising outcomes of COVID-19 is that organizations that were prepared and could transform into a full-time “data business” saw great advantages. Conversely, organizations who were not prepared and remained stuck in the old economy struggled mightily.

Grubhub: Data Company Disguised as a Food Delivery Firm

One firm (as we all know) that benefitted from COVID-19 was Grubhub. It’s revenues grew from $1.3B to $1.8B from 2019 to 2020, which comes out to around 38% growth. Their 2021 revenues are likely to be much larger as they saw Q1 revenue of around $550M. Why is this important to know? The leaders in this market segment made lots of money during COVID primarily due to their digital transformation preparation they did in the handful of years leading up to 2019.

Digital Transformation Approach made by the Food Delivery Service sector.

Here are a handful of things that the leaders in this sector thought wise before 2019 and turned into a win during 2020 and 2021. Grubhub in particular is known as a true champion of digital technology. One of the ways it sought to strengthen it’s partner restaurants is through its “Grubhub for Restaurants” data analytics services. At this Grubhub site, the company discusses data insights their partner restaurants can use to revolutionize their own businesses. They list a number of new metrics that can provide their partners with insights into potential areas of growth. Some of these include:

  1. Delivery Speed. This is an interesting metric to me, because it reflects the flow of goods from raw materials to the hands of the customer. In factories, it is common to build large value stream maps that detail all of the value that is added to raw materials through factory operations as the product makes its way through. This can reveal bottlenecks in the factory that fundamentally limit how much money one can make. Grubhub recommends to their partners that they research alternate routes or techniques to shave off minutes of their value stream. I’d imagine that if Grubhub were smart, they would also sell value stream data services to their partners to help them optimize. If they’re not, I ought to offer my services, as this is right up my alley!
  2. Average Order Size. This is another good metric that restaurants ought to collect consistently. It is a measure that can also increase cash flow and profitability, because it measures a company’s ability to upsell. Often, I’d suspect that the goods being upsold are higher profit goods like dessert, coffee, and drinks.
  3. Customer Reviews. I’ve noted that smart firms patrol their reviews carefully and collect these reviews as data, both to improve their performance, but also to demonstrate their business virtue. A respectful and thoughtful response to a bad review could well result in many times more business than one might expect. This data could also be aggregated together and clustered by artificial intelligence techniques like natural language processing to identify the types of feedback.
  4. Order Accuracy: This is another interesting metric. I suspect most restaurants or similar firms don’t collect this data assiduously, but I suspect a strong, good-faith technique to gain order accuracy feedback from customers could result in a really valuable data set. Perhaps offering drawings for free rewards for providing feedback on order accuracy would be low-cost and high-reward to the restaurant.
  5. Average Orders Per Day: This is relatively low-end data… I believe one could greatly improve on this data feature. At a minimum, trends in orders per day combined with other data features like accuracy and review results could result in a small predictive dataset. Ultimately this could be used to make fairly accurate predictions on business trends per day or week. This might help optimize costs like material and labor costs. Given time and information on a firm, I could certainly think up many more valuable data features to measure that could improve the results of these kinds of predictive analytics.

Data Transformation through Data Strategy

Grubhub had a data strategy and collected data for years before COVID hit. This allowed them to make better and faster business decisions when the emergency arose. Companies without a solid data strategy (measuring important, high information data as a matter of doing business) may do fine when the sun shines and skies are blue, but often lack resources to deal with crises.

Baseball Data for Machine Learning

Link to Pitcher Data

Here’s a dataset that I use to predict player performance for fantasy baseball.  If you’re interested enough, here’s a CoLaboratory program that does the work (you’ll probably need a Google sign-in, but the notebook is public).   The data linked above has already been processed to calculate relative value of the player using SGP.   This is a topic that I’ll probably spend some time discussing in the future.