Occam’s Razor: who is most likely to be Satoshi Nakamoto?

Bitcoin is a very non-obvious idea, of the kind that takes years of crystallization; maybe an entire lifetime. An idea born from the cypherpunk mindset, developed over time by a person with a deep passion for economics of the most abstract kind and a mastery of unusual cryptography concepts. The recent suggestions in the press that every other person who has a scientific background could be the creator of Bitcoin only serve to show how little some “journalists” understand about Bitcoin and its origins.

Previously on this blog, I talked about how a search for characteristic turns of phrase occurring in the original Bitcoin whitepaper led me to the essays of Professor Nick Szabo. Here I recapitulate what I found so far, and present new results from a stylometric analysis applied to the usual suspects of Bitcoin.

First, let us review the facts known about Nick Szabo (NS).

  • NS is an exceptionally brilliant academic at the intersection of computer science, law, and economics. He invented Bitgold, he invented Smart Contracts. The term “genius” is likely appropriate.
  • NS had been working (alone) on a decentralized digital cryptocurrency project since 1998, provisionally named Bitgold. He is one of a handful to have done research in this niche topic before Bitcoin. Others include Wei Dai, Hal Finney, Adam Back, and David Chaum.
  • In April 2008, a few months before the original Bitcoin announcement, he seemed to have reached a tipping point in his work, and publicly asked for help “coding up [a demonstration]” [0].
  • Days after the publication of the original Bitcoin whitepaper, NS went and post-dated all public mentions of his research so as to appear posterior to the Bitcoin announcement [1] [2].
  • The original Bitcoin paper makes no mention of NS’s research, whereas it is largely based on it. Instead, it cites a few people who appear to have inspired NS’s research [3].
  • The Bitcoin paper was initially released on the Cryptography Mailing List, which NS was very familiar with. It took the form of a scientific paper, respectful of all academic conventions, which points to an academic.
  • NS became completely silent about his research after asking for help in April 2008.
  • When asked about the identity of Satoshi Nakamoto, NS made a reply that implied he knew his identity [4]. NS never replied when asked directly if he was Satoshi Nakamoto.
  • The timing of the forum posts of Satoshi Nakamoto indicates he was located in the EST timezone, the same as NS [5].
  • An analysis of the content-neutral expressions found in the Bitcoin whitepaper indicates a match with NS’s writing tics, at a level that only has a one in a thousand chance to be a coincidence [6].

Let us pause here. I have received comments from people concerned that the analysis was flawed. I would argue that the analysis is correct, if you accept its underlying hypotheses (which were detailed in the original post). Let us review the hypotheses of the model, and see if they make sense:

  • (1) We assume that all researchers have a “vocabulary set” of content-neural expressions that they use in their papers. These sets vary from researcher to researcher.
  • (2) We assume that if an expression is in a researcher’s vocabulary, this researcher will use it at least once in 10 papers (this hypothesis is very generous, as in fact researchers tend to reuse the same expressions all the time).
  • (3) We assume that the uses of each expression are statistically independent.

We then considered 4 rare content-neutral expressions, and assessed how frequently they were used in cryptography papers. This was done using Google Scholar. The frequencies obtained were 0.01, 0.05, 0.015, and 0.01.

Therefore the probabilities that a researcher would have these expressions as part of her vocabulary, according to (2) are respectively 0.1, 0.5, 0.15 and 0.1.

Therefore the joint probability that a random cryptography researcher would have all of these expressions as part of his paper writing toolbox is ~1e-3 (according to (3)). Nick Szabo uses these 4 expressions frequently in his papers and essays. Therefore he is one in a thousand who could have written a paper using all of these expressions.

I believe this reasoning is sound. If you disagree, I invite you to attack it, for instance by pointing out counter-examples that would show that (1), (2), or (3) or flawed.

Let us get back to our list of NS-related facts.

  • After the release of the Bitcoin proposal and then software, NS stayed silent for a long time about it, whereas it was the realization of his life project. One would have expected him to get at least a little excited about that.
  • An analysis of the stylometric characteristics of the Bitcoin whitepaper indicates a stronger match for Nick Szabo than for other researchers involved with cryptocurrencies, such as Wei Dai, Hal Finney, David Chaum or Adam Back.

Let us pause again here. This is a bit of research that I did not previously publish here. Let us review what I did, and the results I obtained.

I took extensive writing samples from Wei Dai, Adam Back, Hal Finney, David Chaum, and Nick Szabo. Samples were from 5k to 40k word long. I then computed histograms of word length frequency and character frequency, and compared these histograms with that of the original Bitcoin whitepaper. Here are my results. Units are arbitrary (smaller scores mean closer histograms).

Word length distribution

  1. Diff Nick Szabo / Bitcoin: 0.160
  2. Diff Wei Dai / Bitcoin: 0.241
  3. Diff David Chaum / Bitcoin: 0.257
  4. Diff Adam Back / Bitcoin: 0.337
  5. Diff Hal Finney / Bitcoin: 0.510

Character frequency distribution

  1. Diff Nick Szabo / Bitcoin: 0.191
  2. Diff Wei Dai / Bitcoin: 0.208
  3. Diff David Chaum / Bitcoin: 0.228
  4. Diff Hal Finney / Bitcoin: 0.284
  5. Diff Adam Back / Bitcoin: 0.342

This analysis would need to be run against thousands of potential candidates (all researchers know to have worked on cryptocurrencies / proof-of-work algorithms / etc) in order to be truly significant. But it already confirms that Nick Szabo’s writing matches the Bitcoin whitepaper, not only in the expressions it uses, but also on hard-to-fabricate style metrics.

If you would like to help running this analysis at scale, or if you have evidence pro or against the case of Nick Szabo being Satoshi Nakamoto (quite likely along with one or more technical collaborators), contact me at skye.grey@yandex.com. I will add any new evidence to this blog.

[0] http://unenumerated.blogspot.com/2008/04/bit-gold-markets.html#3741843833998921269

[1] http://unenumerated.blogspot.com/2005/12/bit-gold.html

[2] http://unenumerated.blogspot.com/2008/04/bit-gold-markets.html

[3] https://bitcoin.org/bitcoin.pdf

[4] https://twitter.com/AdrianChen/status/407542548844929025

[5] http://www.wired.com/magazine/2011/11/mf_bitcoin/all/

[6] https://likeinamirror.wordpress.com/2013/12/01/satoshi-nakamoto-is-probably-nick-szabo/


Satoshi Nakamoto is (probably) Nick Szabo

I recently became interested in identifying the pseudonymous creator of Bitcoin, Satoshi Nakamoto. I started from the Bitcoin whitepaper [0] published in late 2008, and proceeded to run reverse textual analysis –essentially, searching the internet for highly unusual turns of phrase and vocabulary patterns (in particular places which you would expect a cryptography researcher to contribute to), then evaluating the fitness of each match found by running textual similarity metrics on several pages of their writing.

Which led me rather directly to several articles from Nick Szabo’s blog.

For those who wouldn’t know Nick Szabo and his documented links to Bitcoin: prior to the apparition of Bitcoin, Nick had been developing for several years (since 1998 [1]) the enabling mechanism for a decentralized digital currency, eventually converging on a system he called “bit gold” [3], which is the direct precursor to the Bitcoin architecture.

According to what seems to be a widely accepted origin story of Bitcoin, Satoshi Nakamoto was a highly skilled computer scientist (or group thereof) who found about Nick’s proposition for bit gold, hit upon an idea for bettering it, published the Bitcoin whitepaper, and decided to turn it into reality by developing the original Bitcoin client. Nick denies being Satoshi, and has stated his official opinion on Satoshi and Bitcoin in a May 2011 article [1].

I would argue that Satoshi is actually Nick Szabo himself, probably together with one or more technical collaborators.

As I mention above, what originally led me to this hypothesis is that reverse-searching for content similar to the Bitcoin whitepaper led me to Nick’s blog, completely independently of any knowledge of the official Bitcoin story.  I must stress this: an open, unbiased search of texts similar in writing to the Bitcoin whitepaper over the entire Internet, identifies Nick’s bit gold articles as the best candidates. It could still be a coincidence, although an unlikely one -since cryptocurrencies were a fairly niche topic in 2008 and earlier, every contributor to the field was going to be reusing the same shared expressions and vocabulary. Satoshi would have been a reader of Nick’s blog, so you would expect him to describe the same concepts in a similar way. But there’s more.

Running similarity metrics on the whitepaper and Nick’s bit gold articles as well as his paper “formalizing and securing relationships on public networks” [2] indicated an excellent match over content-neutral expressions as well –so either Nick wrote the whitepaper, or it was written by somebody imitating Nick’s writing style. Here is a brief summary of some of the more salient common points. For each expression, when it is possible and relevant, we will mention the proportion of cryptography papers containing the expression (using Google Scholar), to measure how common its use is among researchers, and later provide a rough value for the probability of the null hypothesis. Of course, we’ll only do this for content-neutral expressions.

Content-neural terms:

  • Repeated use of “of course” without isolating commas, contrary to convention (“the problem of course is”)
  • Expression “can be characterized”, frequent in Nick’s blog (found in 1% of crypto papers)
  • Use of “for our purposes” when describing hypotheses (found in 1.5% of crypto papers)
  • Starting sentences with “It should be noted”(found in 5.25% of crypto papers)
  • Use of “preclude” (found in 1.5% of crypto papers)
  • Expression “a level of “ + noun  (“achieves a level of privacy by…”) as a standalone qualifier

Content-bearing terms that have common synonyms in the field and thus could easily have been expressed in a different way:

  • Expression “timestamp server”, central in the Bitcoin paper, used in Nick’s blog as early as January 2006
  • Repeated use of expression “trusted third party”
  • Expressions “cryptographic proof” and “digital signatures”
  • Repeated use of “timestamp” as a verb

Consider this: if we assume that, when a content-neutral expression is part of a researcher’s vocabulary, they use it in at least one in ten papers (for instance “for our purposes” appears in 1.5% of papers, so we’ll assume that 15% of researchers would be susceptible of using it in a paper), then the probability of finding all of “it should be noted”, “for our purposes”, “can be characterized” and “preclude” as part of a given researcher’s vocabulary has the upper bound 0.08%. That’s our p-value right there (8e-4): this particular combination could pinpoint one researcher in a thousand.

Of course, the “one in ten papers” hypothesis is purely arbitrary, so it’s up to you to judge if it is acceptable. It seems rather generous to me, as most researchers actually tend to constantly reuse the same handful of expressions.

In short: most of the unusual wording found in the Bitcoin whitepaper can also be found in recurring occurrences in Nick’s articles. Not all of it, though: the Britishism “favour” used by Satoshi is not used by Nick, who writes “favor”. However, the Bitcoin paper may have had several authors, Nick being merely the main one. In fact, since all the paper is written in American English except for this one word, it is highly probable that either the paper had several authors, or this one word was a deliberate attempt at adding confusion as to the origins of the paper.

Then, there is secondary evidence. It is obvious that Satoshi did extensive research about prior mentions of concepts similar to Bitcoin, as any proper scientist writing a paper would have. This is evidenced by Satoshi’s reference to Wei Dai’s b-money, as well as hashcash, while both of them do not even seem to have been a direct inspiration to Bitcoin. However, he made no mention of Nick Szabo’s bit gold, whereas Bitcoin is quite visibly built directly on top of the bit gold ideas. If Satoshi had been writing independently from Nick, wouldn’t he have cited his work as per proper scientific etiquette?

There is also the remarkable lack of public reaction on Nick’s part when Bitcoin started taking off. For somebody as deeply involved in these concepts as Nick, it strikes me as surprising that it took Nick many months to even mention Bitcoin, while his ideas were coming to life in an exciting way.

Another interesting fact that may or may not be significant, is that the main mentions of bit gold on Nick’s blog have been retroactively post-dated to appear as slightly posterior to the Bitcoin whitepaper, and this right after the publication of the whitepaper. There are two major articles on bit gold on Nick’s blog, one originally posted in December 2005 and post-dated to December 2008 [3], and one from April 2008, also post-dated to December 2008 [4] (note: it is possible to manually edit the dates of blog posts on Blogger, however the original date is still visible in the (uneditable) url of the posts).

It is unclear why this post-dating occurred –it cannot really be an effort to confuse the dating of the bit gold system, since it is widely documented to have been publicized prior to 2008 (and again, Nick asserted once that he had started working on the idea as early as 1998 [1]). I would guess that, shortly after the publication of the Bitcoin whitepaper, Nick found something to edit in both of his bit gold articles.

Lastly, one thing to consider is that the profiles of Nick and Satoshi match perfectly. Satoshi is highly likely to have an academic background (Nick is a professor with a significant publication history), as demonstrated by his mastery of scientific writing –writing a paper following proper scientific convention is something difficult to improvise if you haven’t already done it a few times. In fact, the whole idea of getting an idea out there by writing a scientific paper, of all things, is very academically-minded. And the idea of a decentralized digital currency was a central project of Nick’s, that only a handful of people were interested in around the time of publication of the whitepaper. Who was on the 2008 list of academics passionate about cryptocurrencies and who wrote like Nick Szabo? Nick Szabo.

In summary, it seems to me highly likely that Satoshi is Nick (and collaborators). At the very least, there is strong textual analysis evidence that Nick has written significant parts of the Bitcoin whitepaper. I would suppose that either Nick, wanting to get his long-time dream of a decentralized currency further, had contacted one or more technical collaborators that helped him address the shortcomings of bit gold and ship the first client under the collective name Satoshi Nakamoto, or that a brilliant engineer happened to hit upon a better solution for bit gold, contacted Nick, and they decided to bring it to life together. As a side-note, it seems much more likely that a Satoshi-like character inventing Bitcoin would first contact the original father of the project, rather than start devoting all of their resources to shipping what was largely somebody else’s pet idea.

The scenario in which Szabo goes to a technically-minded computer scientist to get help turning bit gold into a reality is strongly backed by the fact that in April 2008, just a few months before the announcement of Bitcoin, Nick was actively looking for collaborators on the bit gold project. He asks on his blog [5] :

“[bit gold] would greatly benefit from a demonstration, an experimental market (with e.g. a trusted third party substituted for the complex security that would be needed for a real system). Anybody want to help me code one up?”

So, after 10 years of thinking about bit gold, Nick becomes interested in producing a concrete implementation of his decentralized currency dream. What happens right after? The Bitcoin whitepaper and software.

Then again, keep in mind that these two scenarios are pure speculation on my part –the only thing that I do have serious evidence for is merely the authorship of the Bitcoin whitepaper.

[0] http://bitcoin.org/bitcoin.pdf

[1] http://unenumerated.blogspot.com/2011/05/bitcoin-what-took-ye-so-long.html

[2] http://firstmonday.org/ojs/index.php/fm/article/view/548/469

[3] http://unenumerated.blogspot.com/2005/12/bit-gold.html

[4] http://unenumerated.blogspot.com/2008/04/bit-gold-markets.html

[5] http://unenumerated.blogspot.com/2008/04/bit-gold-markets.html#3741843833998921269