The Voynich Manuscript

I'm hooked. The Voynich Manuscript (VMS) is an engrossing puzzle, and I very much want to know what it says, and how it says it. For anyone new to the puzzle, the site provides an excellent introduction, as well as more in-depth material and links to many of the other VMS sites out there. (I'm not the only one who's been sucked in!). This manuscript embodies my interested in medieval herbals, astrology, science and cryptography, all wrapped in a single enticing and enigmatic package.

I poked at it a little when putting together a cryptography class in 2004, but didn't develop a serious interest until the winter of 2005-6. Since then, I've tried a number of computational approaches, none successfully. But then, nobody else has succeeded yet either...

My analysis and presentation has all been carried out in the open-source statistical package R, since I use it for work and am already adept with it. The code for all my analyses will be available here as Sweave files, which combine statistical code, documentation and presentation into one location. If you are interested in running them, you will need the ecodist package, available from CRAN at the above address. All of my source data files are available online. The Sweave output was converted from TEX by TTH, version 3.76.

The main analyses rely on the ordination method Principal Coordinates Analysis, a variety of metric multidimensional scaling. Very briefly, the algorithm takes calculated distances between sets of items - in this case paragraph or pages, based on either word or character occurrences - and fits them into a lower-dimensional space in the closest possible way. Think of it like taking the mileage table from the back of a road atlas and using it to reconstruct a map of cities. Closer-together points are more similar than farther-apart ones. Ordination is needed because the raw data can't be plotted directly: with 20 characters, you would need 20 dimensions to visualize the data. Ordination methods reduce the number of dimensions needed while losing as little information as possible.

All of these are works in progress, and the interpretation and discussion may be somewhat sketchy. I'd welcome any comments.