Reverse Engineering the Source Code of the BioNTech/Pfizer SARS-CoV-2 Vaccine

Welcome! On this publish, we’ll be taking a personality-by-personality own a look at the
offer code of the BioNTech/Pfizer SARS-CoV-2 mRNA vaccine.

I desire to thank the clear cast of those that frolicked previewing this
article for legibility and correctness. All errors remain mine even though,
but I’d protect to hear about them rapid at or

Now, these phrases might also very well be critically jarring – the vaccine is a liquid that will get
injected in your arm. How raise out we impart about offer code?

That is a appropriate set apart a question to, so let’s commence off with a tiny section of the very
offer code of the BioNTech/Pfizer vaccine, moreover regularly known as
BNT162b2, moreover
identified as Tozinameran moreover regularly known as

First 500 characters of the BNT162b2 mRNA. Source: World Health Organization

The BNT162b mRNA vaccine has this digital code at its coronary heart. It’s 4284
characters long, so it could perchance well slot in a bunch of tweets. On the very
beginning of the vaccine manufacturing job, any individual uploaded this code to a
DNA printer (trail), which then transformed the bytes on disk to staunch DNA

A Codex DNA BioXp 3200 DNA printer

Out of this kind of machine advance dinky quantities of DNA, which after pretty a pair of
biological and chemical processing crash up as RNA (extra about which later) in
the vaccine vial. A 30 microgram dose turns out to basically contain 30
micrograms of RNA. In addition, there’s a vivid lipid (fatty) packaging
system that will get the mRNA into our cells.

RNA is the volatile ‘working memory’ version of DNA. DNA is just like the flash
drive storage of biology. DNA is terribly durable, internally redundant and
very knowledgeable. But very like computers raise out not invent code at the moment from a
flash drive, sooner than one thing happens, code will get copied to a faster,
extra versatile yet a long way extra fragile system.

For computers, here’s RAM, for biology it is miles RNA. The resemblance is
striking. Unlike flash memory, RAM degrades very rapid except lovingly
tended to. The motive the Pfizer/BioNTech mRNA vaccine wants to be saved in the
deepest of deep freezers is the identical: RNA is a fragile flower.

Every RNA personality weighs on the bid of 0.53·10⁻²¹ grams, meaning
there are 6·10¹⁶ characters in a single 30 microgram vaccine dose.
Expressed in bytes, here’s spherical 25 petabytes, even though it wants to be said
this consists of spherical 2000 billion repetitions of the identical 4284
characters. The staunch informational instruct material of the vaccine is horny over a
kilobyte. SARS-CoV-2 itself weighs in at spherical 7.5 kilobytes.

The briefest little bit of background

DNA is a digital code. Unlike computers, which utilize 0 and 1, life uses A, C, G
and U/T (the ‘nucleotides’, ‘nucleosides’ or ‘bases’).

In computers we retailer the 0 and 1 as the (ab)sence of a charge, or as a
present, as a magnetic transition, or as a voltage, or as a modulation of a
signal, or as a change of reflexivity. Or in short, the 0 and 1 are not some
extra or less summary thought – they dwell as electrons and in diverse other bodily

In nature, A, C, G and U/T are molecules, saved as chains in DNA (or RNA).

In computers, we neighborhood 8 bits into a byte, and the byte is the conventional unit
of facts being processed.

Nature groups 3 nucleotides into a codon, and this codon is the conventional unit
of processing. A codon incorporates 6 bits of facts (2 bits per DNA
personality, 3 characters=6 bits. This model 2⁶=64 a quantity of codon values).

Beautiful digital to this point. When uncertain, head to the WHO
with the
digital code to hunt for yourself.

Some additional discovering out is accessible
– this hyperlink (‘What
is life’) might support create sense of the remainder of this net page. Or, must you like
video, I even own two hours for you.

So what does that code DO?

The premise of a vaccine is to coach our immune system straightforward strategies to wrestle a pathogen,
without us if truth be told getting ill. Historically this has been accomplished by
injecting a weakened or incapacitated (attenuated) virus, plus an ‘adjuvant’
to dread our immune system into motion. This became a decidedly analogue
formula consuming billions of eggs (or insects). It moreover required plenty
of success and lots of time. Every on occasion a special (unrelated) virus became moreover

An mRNA vaccine achieves the identical thing (‘educate our immune system’) but in
a laser like formula. And I mean this in both senses – very narrow but moreover
very highly high-quality.

So here is the way in which it if truth be told works. The injection incorporates volatile genetic arena matter
that describes the notorious SARS-CoV-2 ‘Spike’ protein. By vivid
chemical way, the vaccine manages to score this genetic arena matter into a pair of of
our cells.

These then dutifully commence producing SARS-CoV-2 Spike proteins in clear
sufficient quantities that our immune system springs into motion. Confronted
with Spike proteins, and (importantly) bid-narrative indicators that cells had been
taken over, our immune system develops a highly high-quality response against multiple
aspects of the Spike protein AND the manufacturing job.

And here’s what will get us to the 95% atmosphere high quality vaccine.

The offer code!

Let’s commence at the very beginning, a extraordinarily appropriate set apart
to commence
. The WHO doc has this
marvelous listing:

That is a invent of table of contents. We’ll commence with the ‘cap’, if truth be told
depicted as somewhat hat.

Essential comparable to it is advisable’t horny plonk opcodes in a file on a pc and crawl it,
the biological running system requires headers, has linkers and issues
like calling conventions.

The code of the vaccine begins with the next two nucleotides:


This is in a position to also also be in contrast very primary to every DOS and Home windows executable beginning
with MZ
, or UNIX scripts beginning with
#!. In both life and
running programs, these two characters are not accomplished in any formula. But
they must level-headed be there because otherwise nothing happens.

The mRNA ‘cap’ has a quantity of
. For one, it marks code as coming
from the nucleus. In our case obviously it doesn’t, our code comes from a
vaccination. But we don’t must bid the cell that. The cap makes our code
look legit, which protects it from destruction.

The preliminary two GA nucleotides are moreover chemically somewhat a quantity of from
the remainder of the RNA. On this sense, the GA has some out-of-band
signaling on it.

The “5-top untranslated quandary”

Some lingo here. RNA molecules can handiest be be taught in a single course.
Confusingly, the section the set apart the discovering out begins is called the 5’ or
‘5-top’. The discovering out stops at the three’ or three-top crash.

Life consists of proteins (or issues made by proteins). And these proteins
are described in RNA. When RNA will get transformed into proteins, here’s called

Here we have got the 5’ untranslated quandary (‘UTR’), so this bit doesn’t crash up
in the protein:


Here we advance upon our first surprise. The customary RNA characters are A, C, G
and U. U is moreover regularly known as ’T’ in DNA. But here we obtain a Ψ, what goes

That is one in all the exceptionally vivid bits about the vaccine. Our physique
runs a highly high-quality antivirus system (“the authentic one”). For this motive,
cells are extraordinarily unenthusiastic about foreign RNA and strive very not easy to
execute it sooner than it does anything else.

That is critically of an difficulty for our vaccine – it wants to sneak past our
immune system. Over a few years of experimentation, it became found that if the
U in RNA is replaced by a somewhat modified molecule, our immune system
loses pastime. For staunch.

So in the BioNTech/Pfizer vaccine, every U has been replaced by
1-methyl-3’-pseudouridylyl, denoted by Ψ. The if truth be told vivid bit is that
even though this change Ψ placates (calms) our immune system, it is miles
accredited as a customary U by associated system of the cell.

In pc security we moreover know this trick – it typically is most likely to
transmit a somewhat corrupted version of a message that confuses firewalls and
security alternate choices, but that is level-headed accredited by the backend servers –
that could then score hacked.

We’re now reaping the benefits of foremost scientific analysis performed
up to now. The
of this Ψ formula had to wrestle to score
work funded and then accredited. We must always level-headed all be very grateful, and I’m definite
the Nobel prizes will near in due

Many folks own requested, might viruses moreover utilize the Ψ intention to beat our
immune programs? In short, here’s extraordinarily not in reality. Life simply does
not own the equipment to originate 1-methyl-3’-pseudouridylyl nucleotides.
Viruses count on the equipment of life to breed themselves, and this
facility is completely not there. The mRNA vaccines rapid degrade in the
human physique, and there’s rarely any probability of the Ψ-modified RNA
replicating with the Ψ level-headed in there. “No, Truly, mRNA Vaccines Are No longer Going To Have confidence an price on Your

is moreover a appropriate be taught.

Sufficient, support to the 5’ UTR. What raise out these 51 characters raise out? As all the pieces in
nature, nearly nothing has one definite neutral.

When our cells must translate RNA into proteins, here’s accomplished using a
machine called the ribosome. The ribosome is form of a 3D printer for
proteins. It ingests a strand of RNA and per that it emits a string of
amino acids, which then fold into a protein.

Source: Wikipedia user Bensaccount

That is what we glance taking place above. The sunless ribbon at the underside is RNA.
The ribbon appearing in the inexperienced bit is the protein being fashioned. The
issues flying internal and out are amino acids plus adaptors to create them match on

This ribosome wants to physically take a seat on the RNA strand for it to score to
work. Once seated, it goes to commence forming proteins per additional RNA it
ingests. From this, it is advisable take into consideration that it goes to’t yet be taught the system the set apart
it lands on first. That is horny one in all the capabilities of the UTR: the
ribosome landing zone. The UTR provides ‘lead-in’.

Besides this, the UTR moreover incorporates metadata: when must level-headed translation
happen? And the way in which primary? For the vaccine, they took perchance the most ‘real now’ UTR
they would even neutral obtain, taken from the alpha globin
This gene is identified to robustly score pretty a pair of proteins. In previous
years, scientists had already found strategies to optimize this UTR even additional
(per the WHO doc), so here’s not pretty the alpha globin UTR.
It’s greater.

The S glycoprotein signal peptide

As mighty, the plot of the vaccine is to score the cell to score copious
quantities of the Spike protein of SARS-CoV-2. Up to this level, we have got mostly
encountered metadata and “calling conference” stuff in the vaccine offer
code. But now we enter the staunch viral protein territory.

We level-headed own one layer of metadata to head on the other hand. Once the ribosome (from the
gorgeous animation above) has made a protein, that protein level-headed wants to head
somewhere. That is encoded in the “S glycoprotein signal peptide (prolonged leader

The vogue to search here’s that originally of the protein there’s a form
of tackle price – encoded as section of the protein itself. On this particular
case, the signal peptide says that this protein must level-headed exit the cell by technique of the
“endoplasmic reticulum”. Even Valuable person Dash lingo just isn’t as love as this!

The “signal peptide” just isn’t very very long, but when we own a look at the code, there
are variations between the viral and vaccine RNA:

(Indicate that for comparability applications, I even own replaced the love modified Ψ by a
regular RNA U)

           3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3
               !   !   !   !   ! ! ! !     !   !   !   !   !            

So what goes on? I even own not by chance listed the RNA in groups of 3
letters. Three RNA characters create up a codon. And each codon encodes for a
particular amino acid. The signal peptide in the vaccine consists of precisely
the identical amino acids as in the virus itself.

So how advance the RNA is a quantity of?

There are 4³=64 a quantity of codons, since there are 4 RNA characters, and
there are three of them in a codon. Yet there are handiest 20 a quantity of
amino acids. This model that multiple codons encode for the identical amino acid.

Life uses the next nearly about in vogue table for mapping RNA codons to
amino acids:

The RNA codon table (Wikipedia)

On this table, we are in a position to appear that the changes in the vaccine (UUU ->
UUC) are all synonymous. The vaccine RNA code is a quantity of, however the identical
amino acids and the identical protein advance out.

If we glance closely, we glance that nearly all of the adjustments happen in the
third codon set apart, mighty with a ‘3’ above. And if we analysis the in vogue
codon table, we glance that this third set apart indeed regularly doesn’t matter
for which amino acid is produced.

So, the adjustments are synonymous, but then why are they there? Taking a look
closely, we glance that every adjustments excluding one result in extra C and Gs.

So why would you raise out that? As mighty above, our immune system takes a extraordinarily gloomy
watch of ‘exogenous’ RNA, RNA code coming from outside the cell. To evade
detection, the ‘U’ in the RNA became already replaced by a Ψ.

Alternatively, it turns out that RNA with a greater
of Gs and Cs is
moreover transformed extra efficiently into

And this has been performed in the vaccine RNA by replacing many characters
with Gs and Cs wherever this became most likely.

I’m somewhat serious about the one change that did not result in an
additional C or G, the CCA -> CCU modification. If any individual knows the motive,
please let me know! Indicate that I’m conscious that some codons are extra frequent
than others in the human genome, but I moreover be taught that this doesn’t
impact translation crawl a

The staunch Spike protein

The following 3777 characters of the vaccine RNA are in a similar way ‘codon optimized’
to add pretty a pair of C’s and G’s. In the pastime of home I won’t checklist all
the code here, but we will zoom in on one exceptionally special
bit. That is the bit that makes it work, the section that could if truth be told support us
return to life as customary:

                  L   D   Sufficient   V   E   A   E   V   Q   I   D   R   L   I   T   G
          L   D   P   P   E   A   E   V   Q   I   D   R   L   I   T   G
           !     !!! !!        !   !       !   !   !   ! !              

Here we glance the regular synonymous RNA adjustments. As an illustration, in the first
codon we glance that CUU is modified into CUG. This provides yet any other ‘G’ to the
vaccine, which we know helps give a boost to protein manufacturing. Both CUU
and CUG encode for the amino acid ‘L’ or Leucine, so nothing changed in the

When we analysis the final Spike protein in the vaccine, all adjustments are
synonymous like this.. excluding for 2, and here’s what we glance here.

The third and fourth codons above represent staunch adjustments. The Sufficient and V
amino acids there are both replaced by ‘P’ or Proline. For ‘Sufficient’ this required
three adjustments (‘!!!’) and for ‘V’ it required handiest two (‘!!’).

It turns out that these two adjustments give a boost to the vaccine efficiency

So what goes on here? Whilst you own a look at a staunch SARS-CoV-2 particle, you
can look the Spike protein as, well, a bunch of spikes:

SARS virus particles (Wikipedia)

The spikes are mounted on the virus physique (‘the nucleocapsid protein’). But
the article is, our vaccine is handiest generating the spikes itself, and we’re
not mounting them on to any extent additional or less virus physique.

It turns out that, unmodified, freestanding Spike proteins cave in into a
a quantity of structure. If injected as a vaccine, this would indeed cause our
bodies to originate immunity.. but handiest against the collapsed spike protein.

And the staunch SARS-CoV-2 presentations up with the spiky Spike. The vaccine would not
work very well if that is the case.

So what to raise out? In 2017 it became described how striking a double Proline
substitution in horny the true
set apart
would create the
S proteins grasp in their ‘pre-fusion’ configuration, even without being section of
the final virus. This works because Proline is a extraordinarily inflexible amino acid. It
acts as a extra or less splint, stabilising the protein in the bid we have got to
listing to the immune system.

The individuals that
found this wants to be walking
spherical high-fiving themselves frequently. Unbearable quantities of smugness
wants to be emanating from them. And it could perchance well all be well

Update! I even had been contacted by the McLellan
, one in all the
groups at the support of the Proline discovery. They bid me the high-fiving is
subdued as a result of the continued pandemic, but they’re chuffed to own
contributed to the vaccines. They moreover stress the significance of many other
groups, staff and volunteers.

The crash of the protein, next steps

If we scroll by the remainder of the offer code, we advance upon some tiny
changes at the crash of the Spike protein:

          V   L   Sufficient   G   V   Sufficient   L   H   Y   T   s             
          V   L   Sufficient   G   V   Sufficient   L   H   Y   T   s   s          
               !   !   !   !     ! !   !          ! 

On the crash of a protein we obtain a ‘crash’ codon, denoted here by a lowercase
’s’. That is a well mannered formula of claiming that the protein must level-headed crash here. The
long-established virus uses the UAA crash codon, the vaccine uses two UGA crash
codons, perhaps horny for appropriate measure.

The three’ Untranslated Enviornment

Very just like the ribosome wished some lead-in at the 5’ crash, the set apart we found the
‘5 top untranslated quandary’, at the crash of a protein we obtain a similar
score called the three’ UTR.

Many phrases might be written about the three’ UTR, but here I quote what the
: “The three’-untranslated quandary plays a significant neutral in gene
expression by influencing the localization, stability, export, and
translation efficiency of an mRNA .. despite our present knowing of
3’-UTRs, they’re level-headed relative mysteries

What we raise out know is that sure 3’-UTRs are very a success at promoting
protein expression. In step with the WHO doc, the BioNTech/Pfizer
vaccine 3’-UTR became picked from “the amino-terminal enhancer of break up (AES)
mRNA and the mitochondrial encoded 12S ribosomal RNA to confer RNA stability
and high total protein expression”. To which I impart, well accomplished.


The very crash of mRNA is polyadenylated. That is a love formula of claiming it
ends on pretty a pair of AAAAAAAAAAAAAAAAAAA. Even mRNA has had sufficient of 2020 it
appears to be like.

mRNA might also also be reused regularly, but as this happens, it moreover loses a pair of of
the A’s at the crash. Once the A’s crawl out, the mRNA is rarely any longer purposeful
and will get discarded. On this fashion, the ‘poly-A’ tail is protection from

Analysis had been accomplished to search out out what the optimum quantity of A’s at the crash
is for mRNA vaccines. I be taught in the commence literature that this peaked at 120
or so.

The BNT162b2 vaccine ends with:


That is 30 A’s, then a “10 nucleotide linker” (GCAUAUGACU), followed by yet any other 70

I believe that what we glance here is the result of additional

