HP3000-L Archives

July 2000, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Sat, 1 Jul 2000 19:47:46 EDT
Content-Type:
text/plain
Parts/Attachments:
text/plain (536 lines)
One of the things that tends to happen when I put my "wildly off-topic"
postings onto the list is that I tend to get a lot more private e-mail than
the responses that appear publicly on the list. After this last one regarding
the completion of the human genome map, I got about 15 private emails, of
which I've only answered about a third.

The remainder of the emails asked for particular details of the process, with
the most recent email asking the most basic questions -- and thus requiring
the most detailed answers of all. Let me try and answer the spirit of all of
the questions by answering this one:

> What in the world is the genome map????????  I know that the DNA is made up
> of sequences of nucleotides, Adenine and friends.  What we're looking for
in
> the end are functional units, I believe--segments of the chain which cause
> skin color, height, allergic tendencies and so forth.  Those are called
what?
> "Genes"?
>
> We have something called a map.  But that map doesn't appear to tell us
> where the genes are or what they do.  So what is it a map of?

The "genome" is simply the word that is used to represent the sum of all of
the code that it takes to build you (or any other species). It is the
totality of your inherited DNA, although to be technical, you have several
inherited genomes. The "big one" is the genome comprised of the DNA carried
in the nucleus of essentially every cell in your body -- but there are small,
auxiliary genomes held in the mitochrondia (organelles present in virtually
every animal cell) or in the mitochondria and chloroplasts (organelles
present in virtually every plant cell). These two auxiliary genomes were once
the properties of free-living bacteria that became incorporated into -- and
eventually an obligate part of -- the more complex mechanism of the cells
characteristic of eukaryotes (protists, plants, animals, and fungi).

When people talk of decoding the human genome, what they specifically mean is
the DNA of the nucleus. You actually have two copies of the human genome in
every cell of your body, one inherited from your father and the other from
your mother. Having these two copies makes you a "diploid" organism, a
quality that you share with virtually every other plant or animal large
enough to be seen.

[Minor exceptions do exist in a very few cell types, such as red blood cells,
where no DNA is present. The nucleus and the other organelles are forced out
during cell development so that hemoglobin content can be maximized. Somewhat
similarly, gametes (sperm and eggs) contain only one copy of the genome].

The "genome map" that has just been completed is a sequencing of all 3.12
billion base pairs that comprise the human genome. It is essentially the
equivalent of listing every one and zero that comprises an operating system,
such as MPE or Linux, in order, as one long string, the only real difference
being that the life evolved as a quaternary coding system (G,C,A,T) rather
than the binary system (0,1) we use in computers.

If MPE were laid out this way, the "genome" map of MPE would look like

    0100101010001110010100101.....

while the human genome would appear as

   GCTTCAATATGCATATACCGATTACAAGAT.....

They are very much the same in this regard, but clearly in this form neither
map is particularly explanatory of what they ultimately mean.

Because everyone on this list is well familiar with computers, you are also
well-primed to understand genetics. Nonetheless, if there is any place to
make an early philosophical mistake, it's in giving too much credit to the
code itself. Every coding structure requires a processor, and it is only
through the process of having the  processor read the code and act on it that
the code becomes "alive" and obtains value.

In modern computers, the processors have become relatively simple devices,
and thus are no longer a perfect analog of the complexity implicit in the
processing machinery evolved in living systems.

Rather, a better model would be to think back 200 years to the earliest
automated looms, where there was an enormous amount of machinery being
controlled by the simple 1's and 0's held on punched cards. With a modern
computer, you don't see much of the machinery anymore, but it's still there
in the cell and it's called metabolism, and it is enormously more complex
than any automated loom that Jacquard's cards controlled (even though
Jacquard's complexity shouldn't be underestimated; his most elaborate weaving
programs grew to the size of 10,000 punched cards, one executed continously
after the other).

If you had to now, could you understand the coding structure and philosophy
of a Jacquard Loom? Absolutely. Code is code. And through mathematical
devices as simple as a frequency analyses, taking the same basic tacks you
would use to decode encrypted text, you would very rapidly discover
repetitive start and stop points in the code that would lead you to
eventually to understand their meaning, what operations they controlled, and
their structure in an inevitable regulatory hierarchy that tends to exist in
all machines, either human designed or randomly evolved.

A "gene" is a length of base pairs strung along the DNA chain that is large
enough (4000 to 400,000 base pairs) to encode a polypeptide. Most
polypeptides encoded by DNA are large enough to be called a protein, and most
proteins normally assume an enzymatic activity, although some proteins are
structural, such as collagen and keratin.

If the DNA from just one of your cells were unfurled and strung out in a
straight line, it would be six feet long. A "gene" represents only a
microscopic fragment of that length (about one to one ten millionth). On the
DNA string, we've learned to recognize the start point of a gene by looking
for what is called a "promoter" site. This promoter site virtually always
contains what is now called a "Pribnow box", a sequence of code (TATAAT) that
is amazingly well conserved (a term biologists used to mean that it is common
to all life on the planet, bacteria included). Indeed, the C,G,A,T coding
elements of DNA, when grouped into triplets called "codons", have been found
to carry the same meaning  -- with very minor variations -- in all life, and
is thus spoken of as being a "universal code."

This universality has profound connotations. It strongly implies that the
genetic code is extremely ancient and was probably evolved only once, and
almost immediately, as soon as self-replicating life became possible, 3.8 to
4 billion years ago. Indeed, the code is probably older than DNA itself, and
was most likely evolved in a pre-cursor, RNA-only world.

Everyone has heroes, and I'm no exception. Three of mine, among many, are
Dimtri Mendeleev, Alfred Wegener, and Charles Darwin. All are heroes to me
for the same reason: they possessed the intellectual honesty to follow the
data before them and not let their personal biases too greatly color what
they were observing. But far more importantly, they had the intellectual
courage to come to extraordinary conclusions, conclusions they could not
fully justify at the time of their formation, but simply had to presume to be
true because no other explanation could as easily suffice.

Mendeleev had the extraordinary insight to notice that there were
regularities (periodicities) in the characteristics of the known chemical
elements. But more than that, he had the great courage to draw out his
"periodic" table of the elements, based on these characteristics, and leave
holes where no element was currently known to exist.

One of Mendeleev's own original papers, explaining the nature of these
periodicities, is on the web at:

     http://web.lemoyne.edu/~giunta/mendel.html

and it's worth reading. The discovery of the nature of the atom and the
evolution of quantum mechanical theory 50 years later not only made chemistry
easy and readily understandable, but events and subsequent theories obviously
proved Mendeleev to be exceptionally accurate in his conclusions.

Alfred Wegener's story is very much the same, other than his inability to
sway the scientific community during his lifetime. He too observed repeated
patterns that made one particular conclusion inescapable. I've written about
Wegener before, and a bit of his story, along with an excellent hyperlink,
can be found at:

      http://www.aics-research.com/history1.html

Wegener died on the Greenland ice cap trying to prove the correctness of his
theory of "continental drift" (now called "plate tectonics"). His work was
ultimately finished and published by his brother, Kurt, after his death.

Darwin was every bit as much of a mechanist (a physicist) as Mendeleev or
Wegener, and arrived at equally extraordinary and equally inescapable
conclusions. The ultimate conclusion that Darwin came to was that there was
only one form of life on the planet, and that it most likely had arisen only
once in the history of the Earth, and that the elaborate flowering of species
that we see today is understandable through the application of only a very
few mechanical "laws". Darwin's concluding paragraph in his 1859 "Origin of
Species" very precisely summarizes this view:

=======================================

It is interesting to contemplate an entangled bank, clothed with many plants
of many kinds, with birds singing on the bushes, with various insects
flitting about, and with worms crawling through the damp earth, and to
reflect that these elaborately constructed forms, so different from each
other, and dependent on each other in so complex a manner, have all been
produced by laws acting around us. These laws, taken in the largest sense,
being Growth with Reproduction; inheritance which is almost implied by
reproduction; Variability from the indirect and direct action of the external
conditions of life, and from use and disuse; a Ratio of Increase so high as
to lead to a Struggle for Life, and as a consequence to Natural Selection,
entailing Divergence of Character and the Extinction of less-improved forms.
Thus, from the war of nature, from famine and death, the most exalted object
which we are capable of conceiving, namely, the production of the higher
animals, directly follows. There is grandeur in this view of life, with its
several powers, having been originally breathed into a few forms or into one;
and that, whilst this planet has gone cycling on according to the fixed law
of gravity, from so simple a beginning endless forms most beautiful and most
wonderful have been, and are being, evolved.

========================================

What we have now, 140 years later, is massive, overwhelming evidence, held in
Oracle databases on DEC Alpha servers at the National Institutes of Health,
the Department of Energy and the Celera corporation, that Darwin was right.
The human genome isn't the only genome we're mapping. Eventually, we'll get
around to mapping representative samples from essentially every form of life
on the planet, and some of this work has already either been begun or already
been completed.

There's a very good, gross-level map of these phylogenetic (family tree)
relationships evident on an Australian website at:

     http://trishul.sci.gu.edu.au/~bharat/courses/ss13bmm/archaea.html

All of our evidence, taken from many different sources, indicates a
monophyletic (one-time) origin of life on this planet.

When I was in high school, in the early 1960's, there were three kingdoms,
Plantae, Animalia, and Bacteria, up from just the two Linneanus had
originally proposed: Plantae and Animalia. By the time I finished college,
that number had been raised to five: Animalia, Plantae, Fungi, Protista, and
Monera (the fungi and the protozoans having been separated out of the plants
and animals, respectively, with the fungi having been discovered to be more
animal-like than plant-like). But that categorization has been recently
revised again, principally due to the work of Carl Woese at the Univ. of
Illinois, beginning in the 1970's. Three "domains" have now been defined:
Archaea, Bacteria, and Eucarya, and that's the classificatory scheme shown in
the Australian image above.

Archaea are as different from Bacteria as you are, although that fact was not
recognized until Woese began essentially single-handedly hammering that point
home thirty years ago. Woese's personal web page is at:

     http://www.life.uiuc.edu/micro/woese.html

The page is surprisingly modest for someone who is certain to win the Nobel
Prize in the very near future.

The Archaea are prokaryotes (a simple cell-internal structure), just a
bacteria are, but are otherwise quite different from the Bacteria. Many of
the Archaea are thermophilic (heat-loving) and halophilic (salt-loving).
Indeed, these organisms are life's extremists on this planet. Some species
can not only survive but thrive in boiling water, and prosper in environments
reminiscent of the very early Earth.

Is it likely we're going to discover organisms even older than the Archaea,
perhaps a still-extant species representative of the common progenitor of
these three Domains, called the "progenote", representative of a point in
time where the universal genetic code was first being realized? It's
possible, but probably highly unlikely. Nonetheless, one of the more
interesting questions yet to be resolved is the true nature of the progenote.
We have sufficient information now to make very good guesses as to its
construct, and indeed, we may be able to reconstruct its general evolution
within a few decades.

But what is of greater interest yet is that the progenote was defined by
Woese and Fox as being that last common ancestor to all life, but one in
which the genotype (the coding state space) and the phenotype (the behavioral
state space) had not yet separated, but were rather still one and the same.

If that sounds a little odd and mystical, it's not. During my tenure in
high-school (1959-1963), I worked for the RCA Service Company of Arizona,
repairing and aligning then-brand-new color television receivers. RCA began a
concerted effort to introduce color television to the United States in 1961,
although they began broadcasting limited shows in 1959.

Those first color TVs were wholly analog computers, assembled by hand and
composed of hundreds of discrete components, that performed their necessary
differential and integral calculus calculations though the electrical
analogies of resistance, induction and capacitance. The "computer" and its
"code" was right there in the wiring, in the network topology of the various
components. In this kind of circuit, there is no segregable difference
between the code and its behavior. They're one and the same. But that's no
longer true at all of RealPlayer on your PC. In RealPlayer, the genotypic and
phenotypic state spaces have become two highly distinct constructs. The image
arrives in an encrypted/compressed stream of 1's and 0's, and is processed
into an image by a general purpose processor, using the same basic alphabet
of processor op-codes that are capable of being programmed to do any one of a
hundred thousand distinctly different tasks. Code and behavior in this
environment are significantly different qualities.

In the 40 years since I worked at RCA, it's been extremely interesting to
watch the evolving segregation of code and behavior in television receivers,
a segregation that's essentially been completed now with the evolution of
HDTV and RealPlayer.

[For more information on the segregation of code and behavior in the
progenote, an excellent introductory web site can be found at:

     http://www.sp.uconn.edu/~gogarten/progenote/progenote.htm

It's worth taking the time to read. The word "cenancestor" appears in one of
the diagrams. The combining form "cen-" means "recent", although recent in
this context means 3.8 billion years or older.]

It is astounding how far we've come, so quickly, in our ability to first find
the genetic code, and now understand it. It was only 130 years ago when
Gregor Mendel -- an Augustinian monk who flunked his doctoral examination in
Vienna because he "did not understand the laws of inheritance" -- first
elaborated the "particulate nature" of genetic inheritance, analogizing his
imaginary "factors" (what we now call "genes") to the beads strung on a
rosary.

And it was only 50 years ago that James Watson, a biology graduate student,
and Francis Crick, a physics professor who specialized in crystallography,
together first decoded the nature of the DNA molecule -- and proved that the
molecule was capable of carrying the enormous amount of information that
would be required to construct a living organism. And it was only 12 years
ago, an unbelivably short period of intervening time, that it had become
obvious that the entirety of the human genome could be decoded.

I've included the full text below of one of the thirty or so articles in the
NY Times that appeared this last week on the accomplishment. I thought that
this one was one of the more interesting.

The question remains: what value is this accomplishment? The value probably
cannot be overestimated. It's going to change our lives and our children's
lives dramatically, both materially and philosophically.

Wirt Atmar

========================================

June 27, 2000


READING THE BOOK OF LIFE
Double Landmarks for Watson: Helix and Genome

By NICHOLAS WADE


The genesis and history of the genome project has been intertwined to a
remarkable degree with the career of one man, Dr. James D. Watson.

With Dr. Francis Crick, Dr. Watson discovered the structure of DNA in 1953,
and later helped start the human genome project which, less than 50 years
later, is coming to fruition.

"I would only once have the opportunity to let my scientific career encompass
a path from the double helix to the three billion steps of the human genome,"
Dr. Watson wrote in explaining his decision to become the first director of
the human genome project office at the National Institutes of Health in 1988.
Announcing of the results of the project yesterday, President Clinton
acknowledged Dr. Watson's contributions by telling him, "Thank you, sir," and
the audience of scientists and journalists broke into applause. The human
genome project may be the gateway to the biology and medicine of the 21st
century, but it was at first bitterly opposed by many academic biologists.

They believed that the interesting genes would come to light one by one in
the course of the research they were already doing, and that a federal
project to decode the whole genome would siphon money from their own budgets,
financed mostly by the National Institutes of Health.

The first serious proposals for decoding the human genome, according to
Robert Cook-Deegan in "Gene Wars" (W. W. Norton & Company, 1994), a history
of the genome project's early days, were made in 1985 by biologists like Dr.
Robert Sinsheimer of the University of California at Santa Cruz and Dr.
Walter Gilbert of Harvard. It was the Department of Energy that first picked
up their ideas in 1987, with the rationale that its nuclear radiation experts
needed to know whether the genome could be protected from mutation.

Academic biologists continued to scorn the project. Biology, in their view,
was a science based on clear-cut experiments, not on Big Science-style
extravaganzas that vacuumed up data just for the sake it.

Dr. Watson's former mentor, Dr. Salvador Luria, wrote in a 1989 letter to
Science that the genome program "has been promoted without public discussion
by a small coterie of power-seeking enthusiasts."

With senior biologists lukewarm or hostile to the program, Dr. Watson was one
of the few leaders with the stature to quell opposition and guarantee that
the project would be scientifically rigorous.

Then director and now president of the Cold Spring Harbor Laboratory on Long
Island, he also helped persuade Congress to give the National Institutes of
Health the money to start its own genome venture.


Because of these efforts, Dr. Watson emerged as the obvious candidate to lead
the human genome project, and was appointed as the first director of the
agency's human genome program, a post he held from 1988 to 1992.

In the late 1980's the longest piece of DNA that had been decoded was a few
thousand units long. The DNA molecules in human chromosomes range from 40
million to 250 million units in length, presenting a different scale of
difficulty. The task was daunting but not insuperable if the chromosomes
could be broken down into smaller pieces for decoding and assembled through
some kind of chromosome map that would show how the pieces fit together.


Dr. Watson laid out the elements of what has been the public consortium's
strategy ever since.

He decreed that the project would be conducted at several universities, not
by some central administration, a move that allowed different initial
approaches to be tried.

It had the advantage of spreading out the money, winning political support.

Dr. Watson sought out international partners, particularly in Britain, where
the roundworm genome project had already begun, and vigorously campaigned for
Germany and Japan to join the project.

"It wouldn't be good if the Americans owned the genome," he said in an
interview last month. As a result of his efforts, the consortium now includes
laboratories in Britain, France, Germany, China and Japan.

Another stamp of his stewardship is the program of ethical and legal studies
about the genome, which at first took up 3 percent and now 5 percent of the
N.I.H.'s genome project budget.

Dr. Watson, long concerned that biology should dissociate itself from the
stain of the eugenic movement, announced the program as one of his first
official acts.

In an unusually accurate piece of technological forecasting, Dr. Watson
estimated the overall cost would be $3 billion and that the project could be
completed in 15 years from its official starting date of 1990.

The cost estimate assumed that methods of sequencing DNA would get rapidly
cheaper as technology improved. In fact the cost has fallen from $10 per unit
of DNA at the project's start to 4 cents a unit now.

In laying the basis for the enormous task of sequencing the human genome, Dr.
Watson's consortium eventually brought into being a powerful competitor, the
Celera Corporation.

The concept of Celera grew from a surprising source, a company then known as
Applied Biosystems, which made the principal brand of DNA sequencing machine
used by the consortium's centers.

In devising a new generation of the machines, the company's president,
Michael W. Hunkapiller, calculated it should be possible for a single,
industrial-scale center to start from scratch and decode the human genome
before the consortium did so.

Dr. Hunkapiller's idea required getting into competition with his own
customers. But it also meant doubling the market for his sequencing machines
and their chemical reagents.

To direct the project, he signed up Dr. J. Craig Venter, whose maverick
sequencing ideas regularly earned the disapproval of the academic
establishment yet often proved to work. The venture was backed by Tony White,
president of the Perkin Elmer Corporation, who quickly shed the company's
old-time instrument-making plants and committed it, as the PE Corporation, to
the brave but untested new world of genomics.

Celera was begun in May 1998, with Dr. Venter declaring he would complete the
genome by 2001. This was a bombshell to the public consortium, where Dr.
Watson had been replaced by Francis S. Collins, and the Wellcome Trust of
London, a powerful new medical charity that had become an important player by
financing the Sanger Center in England to decode one-third of the genome.

True to form, several of the consortium's experts on DNA sequencing
pronounced that Dr. Venter's proposed fast method for decoding the human
genome would not work. But Dr. Collins decided that the Celera challenge
could not be ignored.

He advanced the consortium's target finishing date to 2003 from 2005. He also
committed the consortium to producing a rough draft of the genome by June
2000. The draft would focus on the gene-rich regions of the genome (only 3
percent of human DNA codes for genes) and would make the most useful part of
the genome available to gene hunters much earlier than otherwise.

In December 1998, the consortium's two leading production centers, those of
Dr. Robert H. Waterston at Washington University in St. Louis and Dr. John E.
Sulston at the Sanger Center, reached an important milestone by completing
their pilot project on the roundworm genome, the first animal genome to be
decoded.

A few months later it became clear that the consortium's major centers had
ironed out their problems with the human genome and were producing large
amounts of DNA sequence on schedule, giving the lie to Dr. Venter's mockery
of their projections as those of a "Liars' Club."

Dr. Waterston and Dr. Sulston had predicted in October 1998 that Celera would
decode a lot of DNA fragments but would stumble in its plan to piece the
fragments together. "Assembly would likely be woefully inadequate," they
wrote.

But last March, Celera published the results of its first project, the
decoding of the Drosophila fruit fly's genome.

The fruit fly genome showed that Celera probably could assemble the human
genome with its quick method, and sharply raised the level of tension between
the two teams.

A neck-and-neck race ensued to see whether the consortium could announce
completion of its draft genome before Celera could declare that it had
finished its final genome assembly.

Though consortium scientists routinely deplore the attention given to the
race, the competition has benefited the world's biologists.

Without Celera's challenge, the consortium would have had little reason to
alter its academic flight path and produce the useful part of the genome
three years ahead of the 2003 landing date.

Without the consortium's challenge, Celera could be commanding top dollar for
its database, knowing customers had no alternative.

The two sides have adopted different strategies for sequencing the genome and
have produced results that are quite complementary, meaning that there was
always a underlying logic to combining their efforts.

But for many months of rivalry, no peacemaker succeeded in bringing the two
sides together.

Dr. Watson, whose book "The Double Helix" famously described the passions of
a scientific race, remained a committed supporter of the public program he
had shaped.

The consortium had succeeded, he said in an interview last month, "because
people liked and respected each other, and because the consortium wasn't out
for personal glory." He also gave credit to Dr. Leroy Hood and Dr. Lloyd
Smith, both then at the California Institute of Technology, who in the early
1980's pioneered the first DNA sequencing machines.

These slow early models reached their zenith in the latest generation of
machines known as capillary sequencers, like PE Biosystems' Prism 3700, the
machine that launched Celera, and Amersham Pharmacia's excellent though less
widely used Megabace. If the human genome project were allowed a robotic
hero, it would be the Prism 3700.

Only in the last month have the two sides moved nearer as Dr. Venter and Dr.
Collins saw mutual advantage in at least making a joint announcement of
progress.

Through prodigious efforts and expert management, both sides have achieved
remarkable success with their chosen approaches. Celera's whole genome
shotgun approach has proved faster, but both with its fruit fly and the human
genome, Celera has made use of data obtained by the consortium's
clone-by-clone approach. The best way of sequencing a genome may be to use
both methods.

If Celera's version of the human genome proves as good as its fruit fly
genome, scientists may judge it to have chosen the better path. Nonetheless,
both sides can fairly claim credit for the final result.

That biology has progressed from near total ignorance of the hereditary
material to possession of the entire human genome within 50 years is
testament to a hectic pace of discovery.

Even more remarkable is that a single individual, James Watson, should have
played such a signal role in both the opening of the drama and in its
conclusion.

========================================

ATOM RSS1 RSS2