Digital books need good indexes even more than print books do. The best way for publishers to provide robust content discovery is to hire experienced indexers to craft excellent indexes.

Can you imagine a publishing future in which marketers will compete for the services of experienced indexers? I can imagine it. In fact, I think we’re not that far from it.

Search engines are not very good at returning useful results, because most search indexing is based on automatic processing of the content. We have all experienced frustration in using search engines to find information, especially when the topics we are exploring move beyond the purely factual into the realms of ideas, imagination, and feelings. But some of the most important topics about which we are searching for answers are ones in those realms. In areas like these, search engines have not delivered the value they promised.

In light of this reality, I disagree with those who still put hope in automatic semantic tagging of content as a stand-in for intelligent indexing. Good content discovery requires the creation of intelligent indexes by human indexers.

An good index is a semantic map of the content of a book. The better the index, the better the semantic map it provides. This “semantic map” cannot be adequately engineered by keyword tagging or automated search indexing.

. . . the intellectual part of indexing – the analysis of meaning, significance and uniqueness, then modelling the likely behaviour of human readers and providing for their predicted access paths – cannot be automated.[1]

A good index has long been a valuable component of non-fiction books. But indexes are even more important in the digital future of publishing. Not only will indexes help people discover content within a particular book, but they will help people discover content across a whole library of books and other content. In other words, indexes will become a crucial part of content (and product) discovery.

The more thoughtfully and intelligently the index has been crafted, the more likely it will be that the search index will yield useful results.

Here is how it will work:

  • Index entries will be embedded in the content and attached to a range of text — as small as a word, as large as a group of paragraphs or a section.
  • The embedded index entries will be used to create, not just print indexes, but also a semantic map of the content in that book (or article, or feature).
  • This semantic map will be loaded into a search engine. Using the intelligently-crafted semantic map provided by the indexer, the search engine will learn from the indexers what the content is about and provide intelligent results.
  • The reader will have a much better search experience using the search engine that is created in this way. The more thoughtfully and intelligently the index has been crafted, the more likely it will be that the search index will yield useful results.

An excellent search experience is like excellent design: It is largely invisible, but very effective. In marketing, excellent design translates into more sales, even when buyers aren’t aware of the design and don’t realize they are responding to it. In product development, excellent design translates into greater customer satisfaction. Similarly, an excellent search experience leads readers right to the content they are looking for, and can motivate them to come back again in the future. If the search results are linked to product purchase pages, readers are also more likely to buy the book.

An excellent search experience is like excellent design: It is largely invisible, but very effective.

As publishers, we don’t need to wait for ebook reading software or global search engines to incorporate more intelligent indexing into their systems. In fact, as long as they don’t, it is a business opportunity for us: We can provide our customers with a better search and content-discovery experience by making use of intelligent indexing that has been done on all of our past (backlist) books, and by enhancing the way we index future books. Primarily, that means finding, hiring, and training excellent indexers to embed their index entries in our publications. Then we can use that all embedded intelligence to enhance content discovery on our own websites and in our own apps.[2] And if we make our content available to global search engines, we can use the embedded index entries to tell them what each piece of content is “about,” enhancing the chances that search engines will connect our content with those topics.[3]

Eventually, the world around us will catch up: Ebook reading platforms will (probably, eventually) support embedded index entries. When that happens, it will be a fairly straightforward matter to provide those platforms with an upgraded digital file that contains what they need to make use of the embedded index entries.[4] Publishers who do the groundwork of preparing in advance will have a significant leg-up over those who don’t.[5]

Note: While preparing this piece, I went on Adobe Stock to find a photograph of a book index, but most of what I found was pictures of phone books, dictionaries in German, and other irrelevant things. Does everyone have this much trouble with Adobe Stock? It seems that they need to hire some experienced indexers to improve the semantic tagging of their photos. In the end I “found” an image by making one: I scanned The Chicago Manual of Style, 15e, p. 800. 


[1] Bill Johncocks, “New technology and public perception,” The Indexer, vol. 30, No. 1 [March 2012], p. 10. Link: http://www.ingentaconnect.com/content/index/tiji/2012/00000030/00000001/art00003 (accessed April 25, 2016). I’ve uploaded a copy of this article with my own comments and highlights: New technology and public perceptions

[2] Some specialized tools and processes are needed to do these things, of course. There are a number of options available. If you want help, feel free to get in touch.

[3] The “semantic web” is the practice of embedding index entries (often called “semantic markup”) into content on the web, in order to inform search engines as to what that content is about. There is an overview page on Wikipedia (https://en.wikipedia.org/wiki/Semantic_Web), a W3C standard (http://www.w3.org/standards/semanticweb/), and an introductory site on the topic of creating semantic web pages (http://semanticweb.org). Once you have index entries embedded in content, creating semantic web pages that use these index entries becomes a mechanical exercise.

[4] The EPUB3 indexes specification (http://www.idpf.org/epub/idx/) is the most likely path by which reading software will begin to support indexes.

[5] The U.K.-based Society of Indexers has a very good introductory page on “Standards and Technologies” for indexing in the digital age: http://www.ptg-indexers.org.uk/about/technologies.htm.

Posted by Sean Harrison

Leave a Reply