Monday, 15 April 2013

More on Scans of Old Books

The ghost in the machine
I was taken to task by a few people yesterday for implying that anyone could be dissatisfied with the Google Books and Gutenberg work to digitise old books. Surely, it was argued, we can hardly complain - it is wonderful that someone should take it upon themselves to provide such a resource.

I wouldn't argue with that. My only gripe is that it would be nice if the quality of the job was always up to the worthiness of the intention. I imagine the scanning work being carried out by underpaid but overqualified inmates of a big library somewhere. Whoever does it, then God bless them, but a bit of inspection and quality control after the event would offer so much extra value. The picture at the top of this post is an excerpt from Google Books' pdf version of Les Allemands sous les Aigles Francaises - Tome 1 - Le Regiment de Francfort by Lt Colonel Sauzey. It is good that the anonymous backroom staff should occasionally get a bit of visibility, and also good to note that Google obviously encourage the practice of safe librarianism.

In addition, I wish to make some quick - and largely uneducated - observations about the products of a company or companies known variously as Nabu, Biblio and other things, whose mission in life is to make rare old books available in print once more, by exactly this same scanning process. Some of these books are print-on-demand products. I would describe them as approximate facsimiles. I have nothing at all to say about the copyright implications, nor on the apparent furore arising from the public thus having access to works which otherwise exist only in American libraries or private collections. I think I have two specimens of Nabu's output. They are alarmingly slipshod, and the books are not especially cheap.

Strange that the translator of Foy's work knew HTML?
For example, I have the 2-volume English translation of Maximilien Foy's (that's me!) History of the War in the Peninsula, under Napoleon. It has misprints on the covers, no less. One volume failed to get a title on the spine, the front cover of the other is illustrated here - you will note that the main title includes the expression "&NBSP", which is, of course, the HTML code signifying a "non-breaking space" to an Internet browser type program. Classy, eh? A real attention to quality - a real pride in the mission - old books reproduced with care and love for the benefit of future generations.

These books also have a surprising number of missing pages - presumably the operator sneezed, or ended his shift, at these points. Fortunately I have a complete pdf-file version of the same books, so have been able to fill the gaps in my own copies by including printouts of the required pages. Something seems not quite right, though, and I am not comforted by Nabu's published policy statement, part of which says:

This book may have occasional imperfections such as missing or blurred pages, poor pictures, errant marks, etc that were either part of the original artefact, or were introduced by the scanning process. We believe this work is culturally important, and despite the imperfections, have elected to bring it back into print as part of our continuing commitment to the preservation of printed works worldwide. We appreciate your understanding of the imperfections in the preservation process, and hope you enjoy this valuable book.

Cobblers. Are there any grown-ups at home?

I thank Nabu for their good wishes, and note that they are also committed to saving on production costs by not bothering over much with normally accepted ideas on quality assurance.

Not recommended.


  1. Cobblers indeed. That is an especially lame apologia. The problem of imperfect scans might be solvable by better pay, supervision and proofing, but manual, page by page scanning of print books is still a fairly primitive solution. It would be interesting to look ahead 25 years to see if we could convert paper books to truly digital, searchable and interactive versions of themselves rather than the fairly dumb scans we have at the moment. Good post.

  2. Thats outrageous. The publisher is basically saying "we know this book is a load of crap, but we've got your money, we can't be bothered trying any harder and there's nothing else available, so what are you going to do about it?".

    Good-quality automated scans capable of understanding a wider range of fonts must be just around the corner. Mind you, Nabu will probably make a fortune out of that.


