Plan to Build a Genetic Noah’s Ark Includes a Staggering 66,000 Species

By George Dvorsky on at

An international consortium involving over 50 institutions has announced an ambitious project to assemble high-quality genome sequences of all 66,000 vertebrate species on Earth, including all mammals, birds, reptiles, amphibians, and fish. With an estimated total cost of $600 million dollars (£457 million), it’s a project of biblical proportions.

It’s called the Vertebrate Genomes Project (VGP), and it’s being organised by a consortium called Genome 10K, or G10K. As its name implies, this group had initially planned to sequence the genomes of at least 10,000 vertebrate species, but now, owing to tremendous advances and cost reductions in gene sequencing technologies, G10K has decided to up the ante, aiming to sequence both a male and female individual from each of the approximately 66,000 vertebrate species on Earth.

Cofounders of the project announced the new goal yesterday at a press briefing held during the opening session of the 2018 Genome 10K conference, currently being held at Rockefeller University in New York City. The project will involve over 150 experts from 50 institutions in 12 countries.

The announcement comes in tandem with the release of 14 new high-quality genomes for species representing all five vertebrate classes, including genomes of the greater horseshoe bat, Canadian lynx, platypus, Anna’s hummingbird, the kakapo parrot (of which there are only 150 surviving individuals), Goode’s desert tortoise, two-lined caecilian (a strange limbless amphibian that looks like a snake), and climbing perch. These 14 genomes, and those compiled over the course of the project, will be made available to scientists for the purposes of research.

Indeed, there’s more to VGP than just sequencing animal genomes. Like the Human Genome Project, this endeavour will undoubtedly produce breakthroughs in high-resolution sequencing and genome-assembly methods, while resulting in lower costs and fewer errors. The project will also address important questions in biology and disease, and make immediate impacts on the fields of evolution, genomics, and conservation biology. On that last point, a complete catalogue of Earth’s vertebrate species could serve as a safeguard against extinction — both in terms of preventing extinction, and possibly reviving extinct species in the future.

Speaking at the press conference yesterday, Oliver Ryer, a cofounder of G10K and a director at the San Diego Zoo Institute for Conservation Research, said VGP has the potential to “transform all realms of biology.” He said it will allow scientists to understand the reasons for extinction, including the presence of deleterious mutations, inbreeding, and genetic bottlenecks. As an example, Ryer described the discovery of a deleterious recessive gene among California condors, saying “we can now identify birds that are carriers of this lethal trait.” Ultimately, he believes the project will make us “better stewards of life on Earth” and enable us to “preserve our biological heritage.”

When G10K was launched 10 years ago, its members had no idea how long it would take to sequence genomes of sufficient quality to do good science and to do so affordably.

“I am incredibly excited that now we’re now in a position to get it right,” David Haussler, a G10K cofounder and the director of the UC Santa Cruz Genomics Institute, said at yesterday’s meeting. “Now is really the time to get started,” adding, “we have no excuse to not do this.”

To generate high-quality genome assemblies, the VGP team is emphasising “long-reads” over “short-reads,” which means sequencing technologies that produce longer chunks of contiguous genetic data will be favoured over those that produce shorter ones. This will make it considerably easier to assemble the DNA sequences into whole chromosomes. So instead of having to work with a jigsaw puzzle containing millions of pieces, the long-reads will result in a puzzle consisting of thousands of pieces.

Also, the researchers will refrain from combining male and female chromosomes into a single genome — a common practice that was resulting in far too many errors. Instead, the team will assemble both the paternal and maternal DNA of individuals in a process known as phasing. As Gene Myers, a VGP team member and a lead researcher at the Max Planck Institute of Molecular Cell Biology and Genetics, said yesterday, each species will be a “one and done” deal, meaning that the quality of the sequences will be so good that the work shouldn’t have to be repeated in the future. That way, “we can get on with the science,” he said.

In terms of process, the researchers will build long-read sequences with an initial assembly of chromosome chunks called “contigs.” These chunks will be joined together to create even bigger pieces, called scaffolds, which will in turn be linked to others to create even larger assemblies, all the way to full-sized chromosomes. Optical DNA maps and computer algorithms will assist in the process, ensuring the proper sequential order and flagging any structural errors.

“The advances in long-read sequencing and long-range scaffolding technologies is revolutionising de novo [starting from scratch] DNA sequencing,” said Myers. “After a 10-year hiatus, this trend inspired me to return to genome assembly as I believe we will ultimately be able to produce near-perfect, telomere-to-telomere genome reconstructions, and if current cost trends continue, for less than $1,000 (£762) on average per vertebrate species, thus dramatically altering the landscape of genomics.”

Indeed, it wasn’t too long ago that it cost millions of pounds and years of effort to complete the genome of a single animal. New sequencing technologies could soon make it possible to create an entire genome in a single week, said Adam Phillippy, G10K assembly chair and lead at the NIH’s National Human Genome Research Institute. It now costs about $30,000 (£22,000) to sequence the DNA of a new species for the first time.

The new sequences will be stored and made publicly available at the Genome Ark database, a digital open-access library of genomes. Corporate sponsors DNAnexus and Amazon Web “have been instrumental in getting this project off the ground,” said Phillippy.

“This project is outlandish and outrageous — but it’s feasible, and it’s inevitable,” Harris Lewin, a VGP team member from UC Davis, said at the press briefing.

Around $600 million (£457 million) will be needed to complete all phases of VGP, according to a G10K press release. To fund the project, G10K is acquiring money from private institutions and corporate sponsors. But the consortium is also doing some crowdsourcing, having already collected $2.5 million (£1.9 million) out of the $6 million (£4.5 million) required for the first phase of the project (the first phase will involve the sequencing of at least one individual from all 260 orders of living vertebrates).

All hyperbole aside, this is one of the most ambitious projects we’ve seen in a while, rivalling the Human Genome Project (HGP), the Human Connectome Project (an ongoing effort to map all the connections of the human brain), and the VGP’s sister project, the Earth BioGenome Project (EGP), which was announced earlier this year. The goal of the EGP is to sequence all eukaryotes (there are around 8.7 million species on the planet), at an estimated cost of $4.7 billion dollars (£3.5 billion). In an email to Gizmodo, a spokesperson for G10K said EGP will be functioning as the coordinating body, and VGP vertebrate genomes will be contributed to the overall effort to eliminate replication of work.

No timeline was given for the VGP project, but as the HGP demonstrated, a slow start is not necessarily reflective of a project’s overall pace. As time passes, and as technologies and techniques improve, the VGP researchers should start to see accelerating returns, both in terms of speed and reductions in cost. Once complete, we’ll have a remarkable repository at our disposal, one even Noah would be proud of.

Featured Image: Make it Kenya/Flickr