Voynich Reconsidered: a research philosophy

D. B. Cooper and Flight 305: Noiser Voynich Reconsidered: the Sukhotin al...

Voynich Reconsidered: a research philosophy

Voynich Reconsidered is my second book in the series that I like to call “Great 20th Century Mysteries”. The publisher is Schiffer Books of Atglen, Pennsylvania.

I have taken this opportunity to set out some of the ideas that underlay my philosophy in writing this book.

Voynich Reconsidered cover

Image credit: Schiffer Books.

It is universally acknowledged that the Voynich manuscript is a mystery. I believe it was Brigadier John H. Tiltman, writing in 1967 for the US National Security Agency, who first gave it the sobriquet “the most mysterious manuscript in the world". For some researchers, the mystery lies chiefly in the bizarre drawings of plants, cosmic objects, signs of the zodiac, steampunk receptacles, and little naked ladies in pools of blue or green water. For others (and for myself), the mystery resides in the hundreds of thousands of elegant glyphs, arranged in strings that have the appearance of words.

By some interpretations, there are over two hundred distinct glyphs, though only about twenty are used extensively. A handful of these glyphs resemble letters in Latin script, or Arabic numerals; one or two resemble the ornamental flourishes in a monastic letter dated 1172 from San Savino, Italy; the majority resemble nothing on this earth, and have never been seen anywhere else.

In calling the manuscript a twentieth century mystery, I was inclined to date the birth of the mystery either to 1912 when, according to Wilfrid Voynich, he rediscovered the manuscript which later bore his name; or to 1921 when Voynich made his first presentation on the manuscript, to an audience in Philadelphia.

The calfskin vellum, on which the drawings and glyphs were inscribed, has been sampled for radiocarbon analysis. The five samples yielded dates predominantly between the mid-fourteenth and the mid-fifteenth century. The inks are inorganic, and no extant technology permits us to date them.

We are therefore to conjecture any date we like (after about 1450, and no later than 1921) for the creation of the manuscript. However, to my mind the probabilities favor the fifteenth century. I think that, the more time we allow between the vellum and the writing, the less the chances for the creators to find a stock of unused material.

In approaching the mystery, I made a decision to focus on what, for want of a better word, I called the text. I use the term “text” to refer to the strings of glyphs that look like words. The strings are, for the most part, arranged in horizontal lines, that appear to have been written from left to right. The lines are often arranged in groups that we might call paragraphs. There is nothing resembling punctuation, and therefore there is no sequence of glyph strings that we could call a sentence. There is no indication that the text wraps from one line to the next, nor from one page to the next.

The search for meaning

The central element of the mystery appeared to me to be whether or not the text contained meaning.

In my research for Voynich Reconsidered, I became aware of an argument that the text had no meaning. On this view, the scribes of the Voynich manuscript generated the text by means of an algorithm. Some authors have proposed that a suitable algorithm can produce text which emulates some of the statistical properties of the Voynich manuscript.

I believe that the hypothesis of the absence of meaning is not amenable to proof. In any case, to my mind it was more interesting to assume that the text contained some meaning, and to see where that assumption might lead.

As my work progressed, I began to imagine the workplace where the manuscript was created. There must have been a producer: a person who conceptualised the project and paid for it. There was a team of scribes: at least two and possibly up to eight, according to the pathbreaking researcher Prescott Currier. One can easily imagine assistants: furnishing, cleaning, catering, refreshing the artistic supplies. The work must have taken at least several months, maybe years. No matter in what century this took place, the producer must have been wealthy, to afford the materials and manpower for such a project.

We can imagine the producer also as director. He or she had a vision for what was to be produced: a manuscript of over two hundred pages, bursting with illustrations in color, and crammed with text in a script sui generis. He or she had to convey that vision to the scribes, and to set down the instructions whereby they would execute it. It seemed to me, mindful of Occam’s Razor, that those instructions should be sufficiently simple that the scribes could follow them without continual recourse to the director. Likewise, it seemed probable that a single set of instructions prevailed over a period of months or years.

This brought me to the idea of an author: more specifically, a source document or documents.

Precursors

It seemed to be a logical and simple assumption that the text of the Voynich manuscript had a precursor. This document (or set of documents) contained all the meaningful text that would be entered in the Voynich manuscript. If these documents were meaningful, they had to be written in natural human languages: ones that were spoken and written, or at least understood, in the place where the manuscript was created.

These considerations led me to think about where that place might be.

In 1921, Voynich told his American audience that he had found the manuscript in southern Europe. After his death, his widow Ethel Lilian Voynich disclosed that the place, more precisely, had been Frascati in Italy. The languages spoken in Italy in the fifteenth century were Latin and a variety of vernacular languages, among which Tuscan would become the foundation of modern Italian. I recalled that Dante Alighieri had written La Divina Commedia in Tuscan, in 1308-1321, and Monarchia in Latin around 1312-1313. If the manuscript had been produced in Italy, Occam’s Razor would direct us towards Latin and Tuscan-Italian as the most probable precursor languages.

Nevertheless, it would be prudent to allow for the manuscript to have travelled to the place where Voynich found it. Here, I was mindful of the distance-decay hypothesis: the well-documented concept that, the greater the distance between two places, the less the probability of any human or material transaction between those places. I was therefore disposed to think of concentric rings, centered on Italy, and to place the origin of the manuscript, and its precursor languages, within those rings.

If this idea had merit, I could conjecture that the next most probable precursor languages would be French, German and Albanian (as spoken and written at the presumed time of creation of the manuscript). After that, one might reasonably look at the Slavic and Iberian languages, English and Greek. Farther afield, I felt that the probabilities were not such as to justify the research effort.

In short, I could identify at least ten or a dozen European languages which seemed worthy of the effort of correlation with the Voynich manuscript.

Occam

As to the analytical approach: I felt that the Voynich producer would have wished to simplify the task of the scribes. I could imagine a scenario in which the producer handed the scribes a set of documents, written in European languages, in (most probably) the Latin script, or (less probably) another script such as Cyrillic, Greek or Glagolitic. The producer then instructed the scribes to transcribe these documents into the Voynich glyphs. To my mind, the simplest possible instruction would be a one-to-one mapping: one Latin (or other) letter corresponding uniquely to a Voynich glyph.

I was aware of arguments that any such mapping would not necessarily be one-to-one; that some form of encipherment had taken place, before or after the mapping; or that some unknown part of the Voynich text was meaningless filler or junk. All of this is possible. Indeed, in Voynich Reconsidered I have addressed some alternatives: for example, mappings based on bigrams or trigrams, or mappings between glyph strings and precursor words. I considered the possibility that a glyph in the initial position might map differently from the same glyph in the final or an intermediate position. But as a research philosophy, I felt that the permutations of a one-to-one mapping should be thoroughly explored before a major effort was devoted to other hypotheses.

If our working assumption (and it is no more than that) is that the Voynich scribes used a one-to-one mapping, then we have a useful tool in the form of frequency analysis. Edgar Allen Poe made frequency analysis the centerpiece of his story of "The Gold Bug"; so did Sherlock Holmes in "The Adventure of the Dancing Men". A one-to-one mapping of letters to other letters, or to any other group of symbols, preserves the frequencies of the original letters. In all or most natural languages, the letters have a signature frequency distribution: in modern English, the most common letters are E, T, A, O, I and N.

Readers may object that many researchers have tried frequency analysis on the Voynich manuscript. Indeed, they have; but I believe that the permutations have never been fully explored. In using the term “permutations”, I have in mind a matrix with at least two axes. One axis is the precursor language. We may have to test at least ten or twelve such languages. Another axis is the text itself.

Here, we need to elaborate what we mean by the text.

The Voynich manuscript resides at the Beinecke Library of Rare Books and Manuscripts, at Yale University. The Library has performed an immense service to researchers in scanning every page of the Voynich manuscript, in color and at high resolution, and in making those images freely available on the internet. The text is there for all to see.

What is a glyph?

Nevertheless, we still have to ask at least two questions: what is a glyph? and what is a glyph string?

In the era before cheap computing power, researchers were obliged to create transliterations of the Voynich text, according to their respective individual perceptions of where a glyph began and ended, and where a glyph string began and ended. The two most widely used transliterations are Jorge Stolfi’s EVA (now interpreted as Extended Voynich Alphabet) and Glen Claston’s v101. We might more properly refer to the transliterations as keyboard assignments, since the main objective was to map Voynich glyphs to symbols on an English keyboard. Inevitably, there were differences of interpretation.

Here is just one example. Among the Voynich glyphs, there is one which resembles the Arabic numeral 8. In EVA, this glyph is assigned the letter d (with no necessary intention that it would have been pronounced as d, or mapped from the Latin letter d). In v101, the same glyph receives four keyboard assignments: 8, 7, 6 and &, distinguished by minor variations in the quill strokes. In EVA, a glyph resembling a stylised letter m is assigned the keys iin; in v101, the glyph is simply m. Thereby, one of the famous and ubiquitous Voynich “words” in EVA is daiin, and in v101 is 8am, or 7am, or 6am, or &am.

Clearly, a glyph frequency distribution derived from EVA will not be applicable to v101, and vice versa.

When we turn to glyph strings, we encounter a problem which has been less explored but is nevertheless significant. We do not know where a glyph string begins and ends. For sure, in the manuscript there are spaces, of variable width, between glyph strings; and glyph strings seem to stop when they reach the right-hand margin of the page. It is tempting to assume that spaces and line breaks represent word breaks in the presumed precursor documents. But in our modern keyboards, we have a “space” key; when we press it, the computer enters a “space” character, which is usually invisible on the screen. By analogy, spaces and line breaks in the Voynich manuscript might represent letters in the precursor documents: in which case, all analysis of Voynich “words” will break down.

There is more. In the Voynich manuscript, there are thousands of glyph strings which seem to have two parts, each of which is a “word” occurring elsewhere in the manuscript. Are these strings compound “words”; or should we read each such string as two “words”, with the word break accidentally or intentionally omitted? We do not know.

To my mind, these entirely legitimate differences of interpretation required a perception of the Voynich text, not as a single document but as multiple documents, each one based on a permutation of the assumptions that we made about where glyphs and glyph strings began and ended.

I suspect that the late Glen Claston had something like this in mind when he created v101. I conjecture that he intended to allow his readers, if they so wished, to combine some of his multiple keyboard assignments, for example 8, 7, 6 and &; or even to disaggregate some of his assignments, so that his [m] could become [in] or [iiN]. In such situations, the readers would generate successor transliterations which could be numbered v102, v103 and so on. I did precisely this in one of the chapters of Voynich Reconsidered, with alternative transliterations which I numbered up to v104. In subsequent unpublished research, I have experimented with transliterations numbered up to v112.

The matrix

With (say) twelve possible precursor languages, and (say) twelve possible transliterations, we have a matrix of 144 permutations of our assumptions. Each permutation will generate a mapping between letters in a precursor language and glyphs in the Voynich manuscript. For each permutation, we then have to select a suitable chunk of the Voynich manuscript, map it to the precursor language, and see whether the result makes any kind of sense. That final step requires a good knowledge of the precursor language (probably a medieval version of the language), or at least access to large corpora of text in that language.

There are more axes to the matrix. In any natural language, frequency analysis is quite accurate for the most common letters. But languages evolve over time, and even in the same era, the letter frequencies will not be identical from one document to another. We should not expect the letter frequencies in the Gettysburg Address to be identical to those in a modern State of the Union. Therefore, our testing needs to allow for permutations in the ranking of letters, at least for the less common ones.

Here, I think, is where research on the Voynich manuscript, over the last century and more, has run up against the limits of individual patience and persistence. Testing all the permutations is a large effort: too large for an individual (such as myself) with a laptop and little or no knowledge of programming. Fortunately, in modern times computing power is cheap, and programming skills are widespread.

In Voynich Reconsidered, I have attempted to set out a strategy whereby interested and motivated readers, or groups of readers, could approach the text of the Voynich manuscript. They would require a clear focus, no preconceived narratives, skills in some programming languages, and plenty of computing power. This strategy, I believe, will discover meaning somewhere in the Voynich manuscript, if meaning is there to be found.

Like • 0 comments • flag

Published on September 22, 2023 10:16 • Tags: claston, commedia, currier, dante, monarchia, stolfi, v101, voynich

No comments have been added yet.

Great 20th century mysteries

In this platform on GoodReads/Amazon, I am assembling some of the backstories to my research for D. B. Cooper and Flight 305 (Schiffer Books, 2021), Mallory, Irvine, Everest: The Last Step But One (Pe ...more

Robert H. Edwards's profile
68 followers