September 2nd, 2009

The total number of book titles in the world, according to the head of metadata (geekspeak for cataloguing) for Google Books, “when we counted them last Friday”.

I got this nugget from a splendid post at Language Log by Geoff Nunberg, slamming the very numerous mistakes in the catalogue. Some are hilarious: Mae West: an Icon in Black and White is classified under Religion. More obscurely, Moby Dick comes under Computers. The Google guy admits that “we date one edition of A Christmas Carol from a shockingly pre-Gutenberg 1135″. Google have weirdly chosen to use a primitive booksellers’ coding scheme called BISAC. Dixit Nunberg:

In short, Google has taken the great research collections of the English-speaking world and returned them in the form of a suburban mall bookstore.

The blogosphere at its best.

This is more than an enjoyable catfight between experts. Google has vast ambitions and resources – they seriously mean to scan all those 168 million books and nobody is saying they can’t do it. They also have an unsurmountable head start. So it’s very unlikely that the effort will ever be duplicated, though it may be complemented in particular areas. We have in prospect a monopoly channel of convenient access to most of the world’s writing. How well Google does the cataloguing and indexing, and how it manages access, are important matters. The Google Books Settlement being worked on in US courts will be a cornerstone of the information society.

There is one straightforward change in the law that would improve things: roll back automatic copyright renewal. If you take out a patent, you must renew it from time to time or it passes into the public domain. If you couldn’t be bothered, how valuable can your patent be to you? Up to 1978 in the USA, that was the case for copyright: but then the law was changed to make renewal automatic after a first term of 28 years for works published from then on. The start date was moved back to 1964 by amendments in 1992. At the time, it probably looked like a sensible rationalisation: renewal requires paperwork and a bureaucracy which didn’t then look as if they were achieving anything.

Electronic libraries have changed the equation. With the ever-lengthening life of copyright secured by rights-owners through skilful propaganda and bribery election contributions, you have a huge, and increasing, number of works on the shelves (and now hard drives) of zero commercial value. But these are still legally in copyright, not the public domain. They are called orphan works; and have become a major headache for Google Books, its competitors if any, and their users, as the e-libraries must identify indifferent heirs and secure their permissions to make the books available.

It’s true that restoring renewal would only be a partial solution. It would still leave your Great-Aunt Edna’s Memoirs of a Dakota Childhood (cumulative sales 135) in full copyright for a long first stretch of 28 years: but it would clean up the confusion for the books published between 1964 and (today minus 28 years), viz. 1981. Ideally you should require renewal after 10 years, and backdate it to liberate all the orphans over 10 years old.

The current version of the economically illiterate Berne Convention requires (Article 5.2) that “The enjoyment and the exercise of these rights [of copyright] shall not be subject to any formality”, so the rollback might require a long international negotiation. Still, if not now, when?

The reform would be fought by the powerful copyright lobby and its tame shills. Ex hypothesi the orphans are commercially worthless. The administrative costs for renewal of valuable properties would be invisible for the likes of Disney, so they would have no direct stake. But the reform would underline the fact that intellectual property is not a God-given natural right of les créateurs, but a privilege granted by society in the wider public interest. A dangerous wedge indeed.

Update Follow-on here.

Share this post:
  • Twitter
  • StumbleUpon
  • Digg
  • Reddit
  • Facebook

3 Responses to “168,178,719”

  1. [...] a comment » That’s the number of book titles in the world (according to Google’s cataloging project). And Geoffrey Nunberg is [...]

  2. [...] This post was recently mentioned on Twitter by Ryan J. Davis (, who said: Google Says: There are 168,178,719 Books In The World, as of last Friday. [...]

  3. [...] says that there are 168,178,719 book titles in the world, but the rush to catalog them is a “metadata train [...]