Occam’s Razor: who is most likely to be Satoshi Nakamoto?

Bitcoin is a very non-obvious idea, of the kind that takes years of crystallization; maybe an entire lifetime. An idea born from the cypherpunk mindset, developed over time by a person with a deep passion for economics of the most abstract kind and a mastery of unusual cryptography concepts. The recent suggestions in the press that every other person who has a scientific background could be the creator of Bitcoin only serve to show how little some “journalists” understand about Bitcoin and its origins.

Previously on this blog, I talked about how a search for characteristic turns of phrase occurring in the original Bitcoin whitepaper led me to the essays of Professor Nick Szabo. Here I recapitulate what I found so far, and present new results from a stylometric analysis applied to the usual suspects of Bitcoin.

First, let us review the facts known about Nick Szabo (NS).

  • NS is an exceptionally brilliant academic at the intersection of computer science, law, and economics. He invented Bitgold, he invented Smart Contracts. The term “genius” is likely appropriate.
  • NS had been working (alone) on a decentralized digital cryptocurrency project since 1998, provisionally named Bitgold. He is one of a handful to have done research in this niche topic before Bitcoin. Others include Wei Dai, Hal Finney, Adam Back, and David Chaum.
  • In April 2008, a few months before the original Bitcoin announcement, he seemed to have reached a tipping point in his work, and publicly asked for help “coding up [a demonstration]” [0].
  • Days after the publication of the original Bitcoin whitepaper, NS went and post-dated all public mentions of his research so as to appear posterior to the Bitcoin announcement [1] [2].
  • The original Bitcoin paper makes no mention of NS’s research, whereas it is largely based on it. Instead, it cites a few people who appear to have inspired NS’s research [3].
  • The Bitcoin paper was initially released on the Cryptography Mailing List, which NS was very familiar with. It took the form of a scientific paper, respectful of all academic conventions, which points to an academic.
  • NS became completely silent about his research after asking for help in April 2008.
  • When asked about the identity of Satoshi Nakamoto, NS made a reply that implied he knew his identity [4]. NS never replied when asked directly if he was Satoshi Nakamoto.
  • The timing of the forum posts of Satoshi Nakamoto indicates he was located in the EST timezone, the same as NS [5].
  • An analysis of the content-neutral expressions found in the Bitcoin whitepaper indicates a match with NS’s writing tics, at a level that only has a one in a thousand chance to be a coincidence [6].

Let us pause here. I have received comments from people concerned that the analysis was flawed. I would argue that the analysis is correct, if you accept its underlying hypotheses (which were detailed in the original post). Let us review the hypotheses of the model, and see if they make sense:

  • (1) We assume that all researchers have a “vocabulary set” of content-neural expressions that they use in their papers. These sets vary from researcher to researcher.
  • (2) We assume that if an expression is in a researcher’s vocabulary, this researcher will use it at least once in 10 papers (this hypothesis is very generous, as in fact researchers tend to reuse the same expressions all the time).
  • (3) We assume that the uses of each expression are statistically independent.

We then considered 4 rare content-neutral expressions, and assessed how frequently they were used in cryptography papers. This was done using Google Scholar. The frequencies obtained were 0.01, 0.05, 0.015, and 0.01.

Therefore the probabilities that a researcher would have these expressions as part of her vocabulary, according to (2) are respectively 0.1, 0.5, 0.15 and 0.1.

Therefore the joint probability that a random cryptography researcher would have all of these expressions as part of his paper writing toolbox is ~1e-3 (according to (3)). Nick Szabo uses these 4 expressions frequently in his papers and essays. Therefore he is one in a thousand who could have written a paper using all of these expressions.

I believe this reasoning is sound. If you disagree, I invite you to attack it, for instance by pointing out counter-examples that would show that (1), (2), or (3) or flawed.

Let us get back to our list of NS-related facts.

  • After the release of the Bitcoin proposal and then software, NS stayed silent for a long time about it, whereas it was the realization of his life project. One would have expected him to get at least a little excited about that.
  • An analysis of the stylometric characteristics of the Bitcoin whitepaper indicates a stronger match for Nick Szabo than for other researchers involved with cryptocurrencies, such as Wei Dai, Hal Finney, David Chaum or Adam Back.

Let us pause again here. This is a bit of research that I did not previously publish here. Let us review what I did, and the results I obtained.

I took extensive writing samples from Wei Dai, Adam Back, Hal Finney, David Chaum, and Nick Szabo. Samples were from 5k to 40k word long. I then computed histograms of word length frequency and character frequency, and compared these histograms with that of the original Bitcoin whitepaper. Here are my results. Units are arbitrary (smaller scores mean closer histograms).

Word length distribution

  1. Diff Nick Szabo / Bitcoin: 0.160
  2. Diff Wei Dai / Bitcoin: 0.241
  3. Diff David Chaum / Bitcoin: 0.257
  4. Diff Adam Back / Bitcoin: 0.337
  5. Diff Hal Finney / Bitcoin: 0.510

Character frequency distribution

  1. Diff Nick Szabo / Bitcoin: 0.191
  2. Diff Wei Dai / Bitcoin: 0.208
  3. Diff David Chaum / Bitcoin: 0.228
  4. Diff Hal Finney / Bitcoin: 0.284
  5. Diff Adam Back / Bitcoin: 0.342

This analysis would need to be run against thousands of potential candidates (all researchers know to have worked on cryptocurrencies / proof-of-work algorithms / etc) in order to be truly significant. But it already confirms that Nick Szabo’s writing matches the Bitcoin whitepaper, not only in the expressions it uses, but also on hard-to-fabricate style metrics.

If you would like to help running this analysis at scale, or if you have evidence pro or against the case of Nick Szabo being Satoshi Nakamoto (quite likely along with one or more technical collaborators), contact me at skye.grey@yandex.com. I will add any new evidence to this blog.

[0] http://unenumerated.blogspot.com/2008/04/bit-gold-markets.html#3741843833998921269

[1] http://unenumerated.blogspot.com/2005/12/bit-gold.html

[2] http://unenumerated.blogspot.com/2008/04/bit-gold-markets.html

[3] https://bitcoin.org/bitcoin.pdf

[4] https://twitter.com/AdrianChen/status/407542548844929025

[5] http://www.wired.com/magazine/2011/11/mf_bitcoin/all/

[6] https://likeinamirror.wordpress.com/2013/12/01/satoshi-nakamoto-is-probably-nick-szabo/