Self-Publishing: Typesetting and Tools

No-cost Self Publishing Tools that I Use

In previous entries I have described at some length the process I use to get to the point where I start capturing content into my computer. Now that we’re actively writing, I want to touch on some of the things that I think are important to know about using these tools.


The word typesetting goes back to the early days of the printing press where physical movable type elements were set into place by people known as “compositors” to create a page. One funny piece of trivia about this process is that the compositors were extremely skilled at reading text backwards, for that is how they had to set the type for it to print in the correct direction! There was an element of creativity in typesetting too, for the objective was to create an optimal reading experience for the ultimate consumer of the text. As there were many variables under the control of the compositors to achieve this objective, there was sometimes a type of signature that the individual compositors would leave in the printed text.

Word Processing and Typesetting

This may be a controversial opinion, but in my case it is based on experience. I submit (very respectfully) that Word Processors like Microsoft Word are not really intended to perform typesetting. Word Processors were originally intended to replace the typewriter and thus move the office place into the computer era more smoothly. Today’s Word Processing software is far more capable than the early word processors (which used mono-spaced fonts, just like typewriters) but it is still not a native tool for typesetting. There are templates that can be used inside Word for this purpose, but they seem complicated.

To address the challenges of typesetting on a typewriter (a problem that later sprawled over to the Word Processor), Donald Knuth created the TeX system. This family of typesetting solutions generated a number of related descendant variations such as LaTeX. The TeX system was always difficult to learn, but it earned its place in the world of academia for published papers and became something of a typesetting standard.

TeX with a Human Interface! I use the software tool LyX for not just it’s typesetting capabilities, but also because it has what I consider to be advanced writing capabilities too. Here are some of the reasons I’ve used LyX for all my books:

  1. It is available on pretty much every platform. I actively use Lyx on Linux and MacOS X (it exists on Windows too, but I don’t do Windows).
  2. It is free to download. I think it is an amazingly powerful tool, perhaps one of the best to come for free. It’s also pretty intuitive and has some good tutorials.
  3. It has powerful typesetting features. It is very easy to switch sizes and output types of the generated output (I output a .pdf file for my printed materials and I output a .html file when converting to e-book formats). I also vary the settings for these two types of published formats, such as the page size, font, headers, page numbering style, etc. There are a very wide range of format options available and they’re very easy to apply to the whole document.
  4. Navigation. Lyx has a great Navigation feature that allows one to go between sections and chapters in the text at the menu level. I use this a lot and it saves me much time. This is far more convenient to me than a MS Word Table of Contents. Due to this convenient feature, often times one of the first things I do in LyX is create the Sections and Chapters so they all show up in the Navigate menu.
  5. Writing Metrics. MS Word has all the standard writing metrics (number of words, number of characters, etc.), but they often are a few clicks away. I like it that LyX puts the metrics (as well as spell/grammar check) at the top level of the menu.
  6. The Graphical Interface is simple and easy to work inside.
A view of LyX.

E-Book Generation

One of the weaknesses of Lyx is that it doesn’t seem to have a clean way to export a text as an epub or other ebook file format. Perhaps this is acceptable to Lyx, however, because the ebook formats aren’t strong on typesetting anyway.

So when I have completed editing of my Lyx document and am ready to create an ebook that I will use to refine the editing on the Kindle, I export from Lyx to html and then import into Calibre. From Calibre I then convert the html file to an epub file which I then simply email to my Kindle. One of the few things I do inside the Calibre tool before conversion is to load my cover art (if it is ready). Then the converted epub document will have my cover art attached.

The epub format is convenient as well if you wish to send your work around to select others for editing and evaluation. This is because pdf files are hard to read (for me at least) on the phone or computer and pdf files don’t load nicely into the Kindle. The epub file, however, is much easier to read on the Kindle and works exactly like an ebook that you purchase (text size can be increased decreased, etc.).


I have used the free image editing tool, GIMP, for many years, so when I’m writing a book, I tend to want to do my own illustrations. GIMP is very similar to Photoshop in the way it works. My process is fairly simple.

  1. I draw out line art that I then scan into my computer. Generally I ink the line art before scanning.
  2. I import the scanned pdf or jpg file into a GIMP document under File->Open as Layers. I use layers extensively in the artwork because this creates the image in a modular sort of way that allows great flexibility in editing the image.
  3. I remove the white background from the imported image layer using GIMP’s fuzzy select tool. There are many tutorials for this, but in general, the objective is to remove all the white background from the line art layer so that layers beneath it will show through.
  4. I generally create a “background” layer and fill it with what I imagine will be my base background color. I also create a layer above the background layer where I color in between the black lines of the line art.
  5. Creating the cover art for a book is sometimes tricky, but if you use a self-publishing house like for your book printing, they often provide a template that is the right size. Other things to think about: a) Make sure your Image is 200 to 300 dpi and make sure your fonts look clean. Sometimes I’ll make the cover art twice the size I need for the book and then scale it down to 9×6″ or whatever so that everything looks sharp.

Cover Art Buildup Example

Line art layer from an early sketch for my upcoming book cover art
Line art layer with background layer. Note the white wasn’t removed from the line art in the whites of the eye
Adding a “color” layer above the background layer for the detailing of the Iris of the eye.
One more color layer above the Iris to detail the reflection in the Pupi
Assembled Cover Art including the spine and back material. Still a work in progress.


Self-Publishing: Thoughts on the Character Journey

This blog entry is a bit of a deviation from my plan because in the last entry on the writing process, I discussed the act of “emerging” characters and felt that I needed to go deeper into my thoughts on this important process. So we’ll start with how Hollywood (and many others) think about emerging their leading characters.

The Hero’s Journey

The Hero’s Journey is a well-worn strategy to developing a character’s emergence throughout the narrative of a story or (especially) a movie. It has been well-described by mythologist Joseph Campbell in his book The Hero with a Thousand Faces, and time has made it very clear that it can be a very convenient pattern for many authors and screenwriters because it is attractive to consumers and inspirational. And… it is easy to disguise so people don’t get sick of it. See the graphic below. Perhaps some lives follow this cycle of growth, but my thought is that perhaps not so many as we would imagine. We know the life of Luke Skywalker and Harry Potter follow this exact journey, because George Lucas admitted as such and in places it appears that JK Rowling used it as a template. But do normal characters’ lives follow this approach, and if not, is strict adherence to the Hero’s Journey going to make your characters “exciting but not real”?
Simplified Hero’s Journey Illustration by Reg Harris

Making Characters Real AND Interesting?

Here are a few of my concerns with over-reliance on the Hero’s Journey in literature.

  1. The Journey vs. the Character. Many times the reader can be distracted (intentionally?) into believing that the manner in which the character undertakes the journey and the problems the character attacks to finally be able to “return” are more important than the characters’ “becoming real”. I think this is a true challenge in our current era where we value authenticity publicly with our lips, but then destroy it privately with our actions. I probably don’t have to give many people examples of this to gain consensus. Perhaps this is one of the largest disappointments of our last ten years or so of political polarization, the death of the authentic individual.
  2. Cultural Homogeneity. I’m not convinced that the Hero’s Journey maps well across cultures. I sometimes wonder if patterns like this are quietly destroying non-conforming cultures.
  3. The Dark Hero’s Journey. By its nature, the Hero’s Journey leans towards a selfish exercise followed by the main character, with supporting characters scattered around to further their journey. This isn’t always bad and can be a tool to reveal something interesting in a character, but sometimes the Hero’s self-centeredness can devastate all around them in real life rather than provide salvation. This is a counter-pattern that probably doesn’t sell well when applied to books and movies, but unfolds around us all the time. Rewrite the stories of the people who have created the most destruction with their lives using the Hero’s Journey pattern and you’ll find that they often fit, but the outcomes are far from heroic (I’ll leave that as an exercise for the reader). Perhaps this may challenge our sense of morals and virtues (and it’s weaker cousin “values”), but that is a good thing to be thinking about when trying to help your character “emerge” in your story.
  4. Emergence of REAL Characters. Fitting characters (and supporting characters) into a pattern reduces them to just a token that the author hopes to use to satisfy their OWN goals. If an author seeks to understand and introduce readers to a character that they would love to know better themselves, they are not likely to want to present a two-dimensional pattern. However, if they seek to sell a few books and maybe even get a movie deal, maybe the pattern is the fastest way. But I’d submit in most cases it’s not lasting.

An Example of REAL Character Emergence

The original cover for Margery Williams’ “The Velveteen Rabbit,” with illustrations by William Nicholson.
The Velveteen Rabbit Book Cover

In The Velveteen Rabbit, we see Margery Williams emerging her characters, toys that have a desire to become REAL in loving and careful ways. What she does not do is hide the growth process under that gloss of the Hero’s Journey. She shows the struggles and disillusionment and the sadness than co-exist with a character’s true journey, as opposed to their idealistic journey. Getting to REAL in The Velveteen Rabbit is costly. So it is in the real world too. Hank Williams put it well in his frequently-covered song, “No matter how I struggle and strive / I’ll never get out of this world alive”

From The Velveteen Rabbit

“Real isn’t how you are made,” said the Skin Horse. “It’s a thing that happens to you. When a child loves you for a long, long time, not just to play with, but REALLY loves you, then you become Real.”

“Does it hurt?” asked the Rabbit.

“Sometimes,” said the Skin Horse, for he was always truthful. “When you are Real you don’t mind being hurt.”

“Does it happen all at once, like being wound up,” he asked, “or bit by bit?”

“It doesn’t happen all at once,” said the Skin Horse. “You become. It takes a long time. That’s why it doesn’t happen often to people who break easily, or have sharp edges, or who have to be carefully kept. Generally, by the time you are Real, most of your hair has been loved off, and your eyes drop out and you get loose in the joints and very shabby. But these things don’t matter at all, because once you are Real you can’t be ugly, except to people who don’t understand.” – Margery Williams’ “The Velveteen Rabbit”


Self-Publishing: The Writing Process

My goal in this episode of the Self-Publishing series is to touch on both the physical and mental aspects of the actual writing process. Perhaps some of this is taught in creative writing workshops, but other elements are just best practices I’ve stumbled upon.

As I’ve mentioned in previous entries, I try to delay the actual no-kidding writing process as long as I can because my belief is that once I’ve started typing much of the creative decisions I need to make with the work are in the past. In the non-literary world of creative problem solving we often talk about diverging then converging. In my experience, once I’m writing on the computer, I’m in converging mode. Or if I’m not, I need to be or I’ll never finish.

So here are some of my thoughts about this process of converging onto a publishable work:

  1. Building Discipline. One of the more important aspects to actually finishing a book is the intentional discipline that you design into the process. What does this mean? Essentially, the author needs to consistently generate content for the book. My typical approach is to use “Streak” applications on my iPhone and a spreadsheet to capture word count. Both of these together help to build the habit that I need to be able to complete the book. See here and here for discussion of habit-streaks. My typical approach to build the writing habit is to create a simple daily task to “write 100 words” in the hopes that I won’t be daunted by the size of the task. Then, if I’m lucky and I’m feeling inspired, maybe I’ll write more, perhaps many more words. Just getting to the keyboard is often the main barrier. Another thing that I do to keep the barrier to writing low is keeping my laptop out and available in a pleasant part of the house that I pass through often (in my case, the kitchen). This is yet another thing that seems to keep the barrier to the act of writing as small as possible.
  2. Maintaining Enthusiasm. I have started a great many novels where I ran out of enthusiasm for the story and the characters after writing the first hundred pages or so. Since that time, I’ve found that preparedness provides a major mitigation to the risk of losing enthusiasm. This is because all the work that I do before starting to type on the computer helps organize my thought fill in critical gaps in the creative portions of the process.
  3. Unfolding Plot Lines. As stated right above, my goal is to have a good idea of how the plot of the book will unfold during the “paper” portions of this process. Having that captured just means that when I’m physically in writing mode, all I have to do is fill in the details. However, as I’ve heard from others, sometimes the plot emerges as I’m writing in ways that truly surprises me. To make sure I don’t lose this, I drag my spiral notebook around with me everywhere while I’m actively writing. Many times, I feel that ideas that come to me when I’m daydreaming or on my rowing machine greatly improve where I thought the plot was going. Capturing these surprises then gives me something to include in my one-hundred words habit the next day.
  4. Emerging Characters. Just like the emerging plot surprises, the characters in my books often grow organically while I’m writing. I try hard not to be rigid as I define and grow the characters, because sometimes they are trying to tell me something better about themselves. I usually capture these kinds of thoughts in questions, like “what does character A think about when he is lonely?” or “why does character B feel threatened by character C?” Often times I wasn’t thinking about these kinds of human descriptions of the characters when I was in the creative stage, and answering the questions helps me to uncover hidden things about the characters I never knew.


Self-Publishing: Setting up a Project for Success

The previous entries in this series on self-publishing have described the creative process and how to organize it. As I mentioned, my preference during that brainstorming phase is to stay off the computer and rather, hand-write my work. For me, this helps me exercise the right brain more than the left brain. This process can result in many pages — hopefully organized — in a spiral notebook as well as a chapter outline. It might be useful to even do some free-writing to try out different ideas on an opening, and if I do this, I also do this on paper.

However, once I feel like I have a good direction and am ready to shift to the computer, this is when I think about setting up the project formally. This means a few things to me, here are some elements of it.

Organizing for Success

  • Software Tools. Before I’m ready to go all in, I always ensure I have my writing tools installed on my computer and that they’re the latest version. Sometimes I will work on a project across two or three different computers, and one problem I’ve run across is that sometimes, version A of the software won’t read something developed in version B. Really, this has happened to me, but it’s probably not terribly common. I use both Linux and Mac in my workflow and occasionally the newest version of Lyx or Gimp for Linux is a version or two different than the newest version on the Mac.
  • Data Protection. As a computer person, I have a network storage device attached to my router that serves as a data storage location for all computers in my house. Not everyone is this paranoid about data loss, but my business drives the need, so I make use of it for my writing. This not only keeps you from losing your work, but also helps ensure that you have the correct version in place every time you write on any computer that you might use. Sometimes to mix things up I might write on my Macbook on the back porch while staring at the mountains and I don’t want to write on an older version of the book!
  • Text Versioning. Additionally, I may sometimes want to hit the “undo” button and bring back the text from a previous version. On Mac some people might use the Time Machine or something like Dropbox to do this, but I use a software versioning system called Git. I use this also when I write code, so it is very comfortable to me. If this interests you, here’s a page where you can read about using Git with Lyx (the writing tool I use). At the very bottom of the link you’ll see a discussion of how to use Git with Lyx. It’s actually very simple.
  • Word Count record. When I set up a project, one of the first things I ensure that I make is a word count spreadsheet. Anyone who has read my blog realizes that I like data, of course, but actually I think this is a best practice. Every day I record the date and the numbers of words the Lyx tool tells me I have written. This helps me be disciplined in my words per day goal (usually I state 100 words/day as my goal, but in reality, once I sit down to write, I often generate many more). It also gives me a visualization of my writing rates. This can help me recognize if I’m slowing down because the slope of the plot of words vs. days decreases. If I understand that my production has dropped, then I can think about reasons why and how to correct the issue.
  • Illustration prep. If I plan to do my own illustrations, I might immediately start doodling on some line art for book illustrations or cover art. Sometimes this provides me some insight on scenes in the book that might be worth accentuating. The idea is that if I’m interested in doodling about a scene, that might indicate that it’s more important than my left brain tells me it is. This means I may want to scan the line art into my computer. A scanner for line art is pretty essential for anyone who wants to do their own illustrations. The open source image editor, GIMP, is also essential. If you plan to illustrate, you’ll want to ensure at this phase that you have a working version installed on your computer. Don’t worry, I’ll definitely be sharing my process for illustrations in later entries.
  • The writing environment. Selecting an appropriate environment to do your writing is really important. Why is that? I know in my experience, if I’m not comfortable where I’m writing, I never want to go write. That can result in huge gaps in your writing and prevent you from ever finishing. Just as with everything else in this entry, the environment is an important part of the “commitment system” that you want to build before you get into the active writing phase. If anything in this system isn’t running smoothly, I find that I get distracted and run out of steam. I think different people have their own things that make them comfortable, but for me these are important:
  • The computer can be quickly woken up and the Lyx writing software opens fast,
  • There is good light where the computer sits. This is mostly because good lighting puts me in a good mood!
  • The area is uncluttered. Why? I suspect that most writers are like me in that if they see clutter it distracts them. Sometimes my brain even *wants* the distraction, so I try to prevent it
  • The coffee pot is nearby. Though this is a distraction, it is a very useful one!
  • No one in the family is nearby. As this is very hard to do with three kids and a wife, I ensure that this is the case by doing my writing before they all wake up or when they’re not around.

Don’t Forget Why You’re Organizing!

Again, don’t forget that the overall purpose of aligning all of these elements is to be able to meet or exceed the important daily word count goal.


Self-Publishing: Research and Note Taking

Since I’m thinking chronologically about this topic, once I have brainstormed an idea, done a bunch of free-writing, learned about my characters, and built an index of chapters, I generally start with the actual writing on the computer. This is typically the point where I start to suspect that my knowledge of the setting falls far short of the character’s knowledge!


At the point where I become humbled about my weak knowledge of the location, the timeframe, significant people in the era, behavioral norms, etc., of my project, I become a dedicated researcher. Here are some thoughts on this:

  1. Capture Lists of Research Needs. As “Research Needs” pop up, I write them in my spiral notebook. This allows me to build a decent workflow of things I need to look into. Then I can set research goals (maybe something like, research two of the topics on my list every day?). Sometimes I will free-write about my research too. If the setting is in the distant past, I might create dialogue between characters in that time about subjects that are unusual. For instance, what might two shepherds five-hundred years ago discuss about a shooting star they just saw? Sometimes really interesting material comes out of this.
  2. Common Research Subjects. When I’m writing about a time or culture that I’m not personally experienced with, I usually spend time researching foods they ate, customs, ceremonies, flora and fauna of the region, etc. These basics seem to show up in descriptive paragraphs a lot. Perhaps readers won’t know, but if I think something in one of my books is inaccurate, it bugs me. Now the above refers primarily to fiction writing. Most of my work has been in novels and collections of short stories, but I imagine that this advice applies to non-fiction too. If you want to communicate a non-fiction topic well enough to convince people of your expertise, you are probably going to need to understand many, many different dimensions of your topic and then connect them all in your book. Mapping these different directions of research out early will also aid your writing (writers block, to me, comes from lack of confidence in what to say).
  3. Organize the Research Free-Writings. I categorize my research “writings” by their label and I try to keep related labels close together in my spiral notebook. This seems like the best method I’ve used… Notecards were hard to organize and typing the notes into a computer (a spreadsheet maybe?) seemed too left-brained. Sometimes when I go through my spiral notebook (it comes with me to sports practices, coffee shops, church, etc.) I have cool ideas that I attribute to being written on paper.
  4. Don’t be afraid to conduct “just in time” research. Very often while I’m writing I find myself unsatisfied with how I’ve explained some technical detail and this drives me back to researching that specific topic. This often arises when trying to fill in some “color” in the story by describing small but very visual events. Often I want to make sure that I’m very accurate on the details of this “visual insert”. You might understand what I’m talking about if you’ve ever seen the “Lord of the Rings” movies. The wizard Gandalf is trapped by his adversary on the top of a tall tower and is in great peril, but suddenly the camera zooms in on a small moth and we see it flying in great, beautiful detail for a few seconds until Gandalf traps it with a quick movement, speaks some instructions to it, and lets it fly away. See here for a YouTube video of this. Peter Jackson uses this moment of seemingly-unrelated beauty to create some mystery, relieve some tension, or just refocus the viewer’s brains momentarily. The Harry Potter movies use this type of visual inserts quite frequently too. I try to do this in my books from time to time too. Therefore, if you use this element in your writing, ensuring that you have mastered the details is important for pulling it off!


Self-Publishing: Project Inspiration and Organization

It probably goes without saying that this topic is the most important one for someone who is interested in publishing their own work. How do you get started? Here are a few quick rules of thumb that come from my own experience.

  1. Be receptive! I had been trying to write novels for years, starting even with my time in college. Many manuscripts had been started and pushed forward, only to be abandoned as I grew uninterested in the characters and skeptical of my ability to recreate the setting well enough. THEN, one night as I was telling a story to my son at bedtime, it struck me that the story I had been telling him (Scheherazade-style, for one must maintain a set of stories for bedtime!) was growing interesting to me. What to that point I had not recognized became quite clear! I needed to organize and capture these stories as a gift, both to him, but also to others struggling to come up with bedtime stories for their demanding child. Maybe this sounds silly, but my response to this unexpected inspiration helped push me through the many hours of writing and illustrating my first book, The Incredible Adventures of Pirate Zach.
  2. Seek Inspiration and Don’t Judge it. It’s hard to write about something that doesn’t interest you. I have started writing projects about subjects where I found myself curious about the details behind an event that I’ve read about in the newspaper, found in an old book, or even speculated about in my head. The writing of the book becomes the mechanism for “learning” the motivations, discouragements, manipulation, and loves that lie behind some “headline” event. Free-writing is something I did a lot in college (perhaps a professor had inspired this? I can’t recall) when something came to me that was interesting. Free-writing is essentially (to me) trying to capture thoughts about a fascinating subject without any organizational or structural restrictions. Why is this thing interesting? What might have happened to inspire this thing? How many people knew about it? And so on. I think there are two keys to this, though. a) Don’t be judgmental of your free-writing! Let it flow unimpeded by your inner librarian. b) Be disciplined with daily writing. Even at the free-writing phase I set small goals like one notebook page of material per day. Then, it’s a pretty small barrier to sit down and do it. And MAYBE I’ll write ten pages once I force myself to start.
  3. Brainstorming on the Written Page seems to Unlock Insights. This sounds complicated, but I have found that brainstorming during an initial phase of planning on a new idea for a book is much more impactful if I write it on paper instead of capturing it on a computer. Perhaps this is because my mind is less creative when it’s looking at a computer screen (indeed, I do quite a lot of this!) or maybe there’s some other reason. I find that this is where I “learn” about my characters. I try to describe their passions, their deep motivations, what they need to learn, why they’re annoying, and whether they are receptive to growth and redemption. I also can use these hand-writing sessions to unfold why the character is interesting, what in their life is worthy of being captured in a book, etc. Just like during my “writing phase”, I put some sort of daily goal on these sessions and generally fill up a spiral notebook, often before I start actually writing.
  4. Notecards help with Organization. One thing that I tend to like to do is create a high level table of contents before I start writing. Eventually, I will capture this in the Lyx typesetting software I use, but the first thing I tend to do is create one notecard for each chapter. Then on that notecard I come up with a stab at a chapter name, and below that I capture why this chapter is important to the book. Then sometimes I sort the order of the chapters until I get something that has about the right flow that I’m looking for. Only at that point do I go through and type in all the chapter names in Lyx.
  5. Start writing in Lyx. This is my typesetting tool and I think it’s amazing. It’s also free and is available on Windows, MacOS, and Linux. I’ll talk more about this later, but Lyx is the backbone around the bulk of my writing phase. If, however, I haven’t done the above steps to get myself enthused, the writing doesn’t flow.
File:Power-of-words-by-antonio-litterio-creative-commons-attribution-share-alike-3-0.jpg  - Wikimedia Commons


New Blog Series – The Ins-and-Outs of Self-Publishing

As I have shared with a few people, I’m rapidly nearing the end of the writing and editing portion of my next novel. As I was working on this today, it struck me that I have learned quite a lot about the self-publishing journey and that perhaps this knowledge could be useful to others. So here’s my goal.

Goal: Lay out the process for self-publishing from the bottoms-up to provide lift for folks who might be thinking about taking this journey on for themselves. I’ll use my current project and some past projects to provide examples but only when absolutely necessary. Below are what I plan to write about, in some order:

Why? Well, I don’t find a lot of good discussion on the tips and tricks (and motivational techniques) for the complete end-to-end self-publishing journey. As I have gone down this path multiple times in the past and it is fresh in my mind due to my upcoming book, I hope that my notes may be helpful to others who might suspect that this is too difficult or expensive. Hint, it’s not. It just takes time, thoughtfulness, and discipline to complete.


Is Google’s Chatbot Sentient? “Logical” Reasons to Disagree

File:Chatbot.jpg - Wikimedia Commons
“Chatbot” from Wikimedia Commons

I have some strong reasons why I think it’s useful to weigh in on the recent drama around a Google Ethics Engineer’s declaration of the LaMDA product having reached sentience. Slate has a great article about this that I think may bring you up to speed if you’re interested.

Slate’s position on why the assignment of the sentient label to the chatbot was misguided revolves around LaMDA’s complete reliance on human inputs and foundational language models. My assessment extends their position in a different direction and I’ll explain why.

Deductive Logic

All of these foundational models rely on two different types of logic that are common to the software community. The most common is called deductive logic and describes the process where the software compares the truth (or lack) of multiple assertions to determine actions to take. This is a pretty high level explanation (forgive me) which summarizes a significant body of research and work in deduction, but in general, deduction describes the application of rules and logic.

Inductive Logic

Inductive logic is useful for drawing inferences or conclusions from historical observations. If you draw eight black marbles out of the bag in your first eight tries, you might infer from this data that the bag is full of black marbles. Induction has recently experienced a resurgence in software due to the recent interest in deep learning and other machine learning techniques. Machine learning is a form of inductive logic where historical data are trained into models which allow the machine to infer likely outcomes due to current sensed parameters. So in the example of my weather data systems, I have sampled parameters like temperature, pressure, humidity, luminosity, etc., for years. I have also captured when rain occurs (easy to do in Tucson… it doesn’t happen much) and can then label every example of weather data with “rain” or “not-rain”. THEN, if I want to predict whether it will rain at some time in the near future, I conduct inference into my trained weather-rain model using the current values of the weather sensors. This is a very simple description of how machine learning works.

Combinations of Logic

Much current software relies on combinations of traditional deductive logic that makes decisions on when to incorporate inductive logic inferences in order to solve problems most effectively. I always imagine traditional software logic that is evaluating and connecting hundreds and hundreds of small trained machine learning models. This is an example of the combination of these two kinds of logic. This, very simply stated, is what the large foundational models like LaMDA and GPT-3 are doing. The difference is that they are generally using deductive logic rules and VERY LARGE trained models. Most of these foundational models are so large and computationally expensive that most normal people don’t have much ability to use them in any other format than toy applications provided by Google or OpenAI. The very large body of language used by these foundational models allows them to do incredible inference based off of language created by real humans. All the text in Wikipedia is an example of some of the language used to train these models. Inferencing these models using questions from humans (such as the Google employee) can yield surprising, even spooky results. Deductive logic rules can eliminate ridiculous or meaningless responses.

What’s missing? (Who knows what lies in the heart of a machine?)

Despite the fact that these foundational models can be VERY useful, they’re missing something major that prevents them from truly understanding language. How can I say this with confidence?

Abductive Logic is What’s Missing!

It is easy to point out that machines do not (and will not in the near future) have the capacity for abductive logic. Abduction describes an ability that humans have to make an observation Q and conclude that some general principle P must be the reason that Q is true. Notice that this is quite different than deduction and induction. The complexity of the various principles in the world makes abduction very difficult to perform. Sherlock Holmes was a renowned expert in using abduction when he would see, for instance, a wedding ring that was more shiny on the inside than the outside and make the conclusion with no further information that if a person removed the ring frequently it might have that appearance. Machines are not able to make these kinds of intuitive “leaps”. Our current, modern view states that science itself is an example of abduction. We seek hidden principles or causes that we wish to use to actually deduce the observable facts, “Frequently removing a ring might explain why it is shiny and clean on the inside but not the outside”.

There is plenty of research out there telling us that machines can not perform abductive logic. Part of the reason is that in abduction, a likely hypothesis needs to be inferred from a nearly infinite set of explanations. Something in the human brain protects us from getting locked in the infinite loop required to evaluate all these explanations. It is likely to be some mashup of intuition and mental models of rules and value systems that we use to jump to the most likely causes to explain the data. To go deeper, Mindmatters has a great discussion of all these concepts here. They also have a three part series on “The Flawed Logic Behind Thinking Computers”. Part1, Part2, and Part3. There are many more articles out there that explain this gap of machine intelligence including this one from VentureBeat.

Abduction and Natural Language

There is a growing body of work that indicates that abductive reasoning is part of the reason why humans can understand language (Neurips Proceedings link). Some of this is due to the need to interpret to decode errors in language. A famous example comes from Don Quixote where Sancho Panza, Don Quixote’s assistant says: “Senor, I have educed my wife to let me go with your worship wherever you choose to take me.” Don Quixote, immediately identifying the improper usage replies, “INDUCED, you would say, Sancho. Not EDUCED.” By our definition of abduction, we can see that here, Don Quixote uses abductive logic when he adopts the hypothesis that “induced” is the intended word given the context and the similarity between the two words. According to Donald Davidson, This kind of abductive interpretation can occur in natural language understanding when:

  1. Applying a hypothesis to understand new names or labels
  2. Revising prior beliefs or interpretations about particular phrases
  3. Altering interpretations of predicates or other grammatical constructs to fit the context


In the light of the growing numbers of applications of Machine Learning, there has been much more discussion of deductive and inductive reasoning than there was even ten years ago. It’s likely you’ve seen some of this.

It does appear, however, that the understanding of abductive logic is lagging. Though there have been efforts to simulate machine abduction, it has still yet to have been accomplished and for legitimate processing tractability reasons is likely not to be accomplished on traditional (not quantum) computing. This severely limits a machine from true natural language understanding, which would be needed by any sentient being to understand language and communicate. This would also apply to chatbots and describes why they are just examples of the Chinese Room (or a human-language-speaking parrot), neither of which demonstrate understanding of the languages emanating from them.

Organizing for AI&ML Success – from Conway’s Law to the CDAO

Here’s a topic that I have given a great deal of thought to after observing lots of examples of how companies organize to identify, sense, collect, and use their business data. In a nutshell, HOW a company chooses to organize their data strategy and teams determines how successful they will be in delivering business value through data. Why is this? Conway’s Law gives us the reasons…

Conway’s Law

In short, in 1967, Melvin Conway, a computer programmer proposed that organizations design systems that mirror their own communication structure. This sounds very simple, but I’ll give some examples of why this provides really great insight into the power of architecting organizations around desired business outcomes.

First, why does this make sense?

Conway suggested that the architecture of products by organizations who are broken into functional competencies will tend to reflect those functions. For instance, an application developed by a firm with four functions: mechanical engineering, electrical engineering, software engineering, and signal processing will develop applications with distinct modular capabilities that reflect those functions. A module that manages thermal loads, center of gravity, control systems, structural sensing, and power will emerge and be developed by the mechanical engineering group. This module will interface to another module that contains embedded processing and memory through interfaces that carry power and sensors that provide data. This second module, of course, will be developed by the electrical engineering team. The software engineering team will develop a module that will be loaded into the electrical engineering’s processing system through some programming interface and will receive signals from sensors as well as elements within the mechanical engineering modules and will use logic to make decisions. The signal processing team will also develop a module that will be triggered by signals from the software engineering module and will provide outputs that interface with control modules in the mechanical engineering module. Phew! See below for a very high level visualization of how this might occur. Note how each department “owns” their own content and then someone (hopefully a systems architect or systems engineer) manages the interfaces.

Very high-level block diagram demonstrating Conway’s Law – Tod Newman, 2022

Conway’s Law and Data Science / AI&ML

I have seen Conway’s Law borne out over and over with regards to Data Strategy in an organization. Organization one (lets say Mechanical Engineering) understands their business function well and is intent to optimize for this function. They develop a strategy around data collection, storage, and analysis that helps them achieve their goals. Organization two (Finance, we’ll say) does the same thing. Then Organization three follows suit, and so on. Eventually what we have is 10-15 different data silos, each of which works relatively well for the owner (but each of which requires attention and sustainment — something that’s not always present). However, in traditional organizations (companies not named Uber or Google or SpaceX or similar) there is rarely a central figure like the systems architect who designed the complete business data system and who manages the interfaces. Therefore, Conway’s law results in the isolation of multiple, locally-valuable data sources. Frequently because these organizations design their data strategy to their own unique needs, there’s not even a clear way to connect these data stores!

Are there Solutions?

There are lots of examples of companies who have avoided the bulk of this negative effect by designing a centralized data strategy up front. As I alluded earlier, these companies are often data firms that offer a service like Google or Uber. They were born as data companies and developed from the ground up. If you’re not lucky enough to be a company that was born a data firm, however, there may be some possibilities, but I think they might be difficult and involve culture change management.

  1. Centralize the Data Strategy and Empower an Owner: This role has traditionally been called the Chief Data Officer and these days I’m noticing a positive trend towards redefining this role as the Chief Data and Analytics (or AI) Officer. Here’s a good explanation of the difference. This will have the effect of making the statement to the organization that data is now seen as a central business asset vs. simply a local asset. As the Harvard Business Review states, the trend towards naming CDO’s or CDAO’s “reflects a recognition that data is an important business asset that is worthy of management by a senior executive” and that it is “also an acknowledgement that data and technology are not the same and need different management approaches.” Note that redefining and centralizing the organization can leverage the positive aspects of Conway’s Law towards the goal of integrated, aligned data sources.
  2. Identify “low-hanging fruit” in your existing data silos for integration. You may be lucky and have a common key (employee number, part number, etc.) between two data silos that enables the data to be joined. This assumes that you can get permission to see the data by the silo owner, however, which might be a large assumption. Regardless, a demonstration of the power of integrated data could make the case for the difficult decisions and culture shifts (from local to collective ownership of data).
  3. Make a mandate. Jeff Bezos (legendarily) made his API Mandate at Amazon which required all data and functionality to be exposed publicly across Amazon through a defined interface called an Application Programming Interface (API). This interface managed both access to the data as well as insight into the structure of the data. It is said that this mandate changed the company and enabled their future high-value Amazon Web Services business.


If you’ve made it this far, then you probably have the gist of my argument. If you’ve skipped to the conclusion, here’s what I’d want you to know:

  1. Organizations the build a Data Strategy from scratch will fall into the Conway’s Law trap and are unlikely to have the ability to understand data interfaces.
  2. Conversely, a carefully-architected Data Strategy (everything from design of information to be sensed, sensing approach, collection, and application of Data Science, etc.) can be a surprisingly powerful lever for gaining business value. Some of the largest return on internal investments in process improvement I’m aware of inside large firms involve joining previously-unconnected data sources and gaining a new valuable insight for decisions, risk management, or even better understanding of the flow of business value from suppliers to the hands of the customer.
  3. It is hard to apply a new Data Strategy to an existing business culture. Unless you are leading an amazing business culture, it will require change management techniques (like John Kotter’s 8 steps) to succeed.
  4. An empowered role like the CDO, or better, the CDAO, may help this culture change and can make the kinds of “Bezos API mandates” that might be needed can aid success. It can also help with the next challenge, Sustaining the Data Business.

From Documents to Knowledge – Simple Ways of Building and Questioning Knowledge Graphs

Here’s an applied approach to the hard problem of what is referred to as “knowledge representation“, where we provide structures for machines to capture information from the world and represent it as knowledge that can be used to solve problems. There’s a long history of research into this challenging field and much of that research has failed to result in simple, approachable methods.

As someone who thinks hard about building intelligent assistants that enable more effective human decisions (rather than intelligent agents that make their own decisions), I have spent time and energy to approach the knowledge representation problem from this context. This means I work to build systems that can extract and build knowledge from sets of texts and documents that humans will never be able to read through. This system can then provide the human decision maker information visualized in a simpler way that will then improve their decisions.


Context: I was looking for a set of documents to demonstrate my techniques on around the time the Ukraine war was about to begin. As it turned out there had been numerous reports and analyses developed anywhere from 6 months before the war begin right up to the days before the war started.

Goal: Determine AFTER the war began if there was anything in the early analyses that predicted what was going to happen.

Outcome: As you’ll see, interesting predictions could be distilled out of “questions” presented to the knowledge graph.


  1. One of the hard problems with this kind of analysis is puling data out of the texts that one finds scattered across the internet. I tend to use search engines to find files that I download and then process in bulk. Generally these documents are in PDF formats, which generally makes them a bit harder to process. Automating accurate processing of PDF files is beyond this scope, but it’s a bridge that probably must be crossed for someone interested in Natural Language Processing and Knowledge Representation.
  2. Knowledge Graphs: Building a knowledge graph isn’t nearly as difficult as it sounds, but it requires a few things. A toolkit like the python Natural Language Toolkit (nltk) is very useful, as it has the necessary ingredients like sentence tokenization, word tokenization, and parts of speech classification. Here’s a great overview from a notebook on Kaggle, the data science competition site. One first will use all the downloaded texts to build a “master” knowledge graph, that consists of Subject->Action->Object “triplets” built into a network graph. This graph will be incredibly dense, but what will emerge are central concepts that are frequently noted in the texts.
  3. “Questioning” the Knowledge Graphs: This may also be viewed as filtering central topics out of the master knowledge graph by asking questions of the graph. For instance, the question, “Will Russia’s invasion trigger an economic impact and increased immigration” provides a filtered view of the master knowledge graph that looks like the below:
Ukraine War Knowledge Graph filtered by “Economic Impact” and “Immigration

If you look closely, you will notice that the nodes (blue) are nouns and an arrow points from the subject to the object. The arrow is referred to as the “edge” and it is labeled with the action verb in red. This is interesting and makes nice pictures in presentations and papers, but it becomes useful when the graph is converted into a table of triplets from the filtered graph that point to the context from the document where the triplet was extracted. At this point, the researcher finds the sentences that generate the “answers” to the question. See example of this context below.

Context sentences corresponding with Knowledge Graph search for “Economic and Immigration Impact of Ukraine Conflict”

As you can see, there were multiple discussions of immigration and economic challenges in the set of documents and the “answers” to the question found in these documents are captured in the table (Note: I’m just showing the first few rows of the answers). If one wanted to conduct a very thorough literature search of a much larger set of documents, it is likely that this method could save countless hours of digging through documents and enable quicker and better decisions on the subject.