HP3000-L Archives

September 1998, Week 5

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Larry Boyd <[log in to unmask]>
Reply To:
Date:
Tue, 29 Sep 1998 17:25:02 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (127 lines)
Wirt wrote, after Paul Christidis wrote, about Data Warehouses:

First, let me say, I, as with Wirt, have no direct involvement or revenue
tied to DW.  So, as with Wirt, these are my opinions on the subject.

> This subject represents one of my current irritants, so please excuse the
> animated nature of my reply. In short, I feel that data
> warehouses are an idea
> designed primarily to separate users from their money. Moreover,
> implementing
> data warehouses is a process that tends to give the appearance of
> progress and
> activity at the expense of careful thought.

I disagree that an acceptable DW can be built without careful thought.  I
agree with Wirt that many attempt to "fix" problems associated with an
initial bad DB design with DW.  However, heaping bad design on top of bad
design only makes things worse, which may be why most DWs aren't providing
the solutions expected.

> I feel all the more strongly about this when the databases are
> IMAGE-based.
>
> Data warehousing was an idea that was born almost wholly in the RDBMS
> community, particularly among those people who had designed wildly over-
> normalized databases. Query extractions against these databases
> simply took
> forever -- and these people were faced with a choice: either use
> the database
> for data entry or data reporting, but not both.

Generally, I agree.  However, I might add that part of the problem was (and
is) the continued growth of data.  Most research indicates that companies
have data growth rates of 50%, and that this will continue well into the
future.  While computers continue to increase in performance, there always
seems to be some type of performance issue when either entering or reporting
data.  Few companies are satisfied with the performance of both, and those
that are dissatisfied with both have problems that no DW will solve.  The
solution to poor performance for both entering and reporting data is much
more complicated than adding a DW.  However, the solution to a performance
problem to one side (entering or reporting) may be a DW (or DB shadowing).

> Nonetheless, the first great rule of data processing 25 years ago
> was: don't
> duplicate your data. Keep a central, single point of control. If
> you duplicate
> any part of your operation, data, code, whatever, you screw
> yourself up in a
> thousand simple ways, all of which make life extraordinarily more complex.

As Wirt knows, I agree completely - don't duplicate your data, especially
control of the data.  However, even years ago, when an application out grew
the performance capability of a 3000 it became necessary, unless you
application was such that removing data was available.  Does this make it
more complex?  Absolutely.  On the other hand, because of performance/data
needs it may be the only solution.  If you do have to duplicate data, such
as with Netbase/Shareplex or a DW, then make sure you retain a single point
of control.

> I don't think that there's any reason to believe that the rule
> has any less
> validity today.
>
> Moreover, a well thought out, properly indexed IMAGE database can
> be designed
> to allow both very high speed entry and very high speed
> retrievals, especially
> now that CIUPDATE allows you to add new automatic masters where
> necessary and
> these masters can be b-treed with such enormous ease.

It is true that a certain size IMAGE DB on appropriately sized 3000s can
obtain excellent performance for both reading and updating.  However, there
are many who have IMAGE DB's on the largest 3000s who cannot realize these
benefits.  This is one reason for both DW applications and shadowing
applications such as Netbase/Shareplex.  Many companies choose to run on the
3000 because of all the reasons we all love it (uptime, forward
compatibility, ease-of-use, etc.).  Unfortunately, some of their data
requirements create IMAGE DB's which, even with thorough consideration of
design, do not perform acceptable for both entry and reporting.

> Secondly, if the "data warehouse" is to appear on the same machine as the
> original database, what have you gained? CPU utilization will be
> the same or
> greater and database synchronizations are going to be a constant
> and pervasive
> problem, particularly so if you care about the accuracy of the data in the
> duplicated database.

The gain is from the reduced amount of accessors to the same block of data.
Using a DW on the same machine as the DB will often make you ask, "what have
I gained?"  Generally, if the performance is acceptable by putting both on
the same machine, then you should be able to design the DB to perform well
for both entry and reporting.

> Adding keys where necessary to existing IMAGE databases for
> effective, high-
> efficiency queries is a virtually zero cost process, but it will get you
> virtually everything that any data warehouse vendor will promise you, at
> almost no increase in disc space utilization, CPU bother, or
> fiscal cost, but
> with all of the advantages attendant to simplicity of operation
> -- and those
> advantages can never be minimized.

In the 'old' days, we use to remove the historical data from the machine
completely.  For example, maybe we kept 2 years worth of financial data on
line and pushed the rest off to an archival tape.  The two reasons were cost
of disk space and performance.  Today, disk space cost is relatively cheap
(although it is still nearly 8x tape costs), and the performance of
computers is extremely high (and will no doubt continue to increase).
Therefore, we no longer remove the financial data after 2 years - we keep it
on line forever.  In an environment where hundreds of thousands of
transactions (both read and update) are occurring on the data per day, using
a DW may make sense.

It's not a solution for everyone.  DW, and Netbase/Shareplex, are available
to address specific performance problem.  I know of several companies that
did thorough analysis of the IMAGE DB design, yet have so much data and so
many accessors that having read-only access to a copy of the data was
completely necessary.

(While I don't suspect to be flamed by Wirt, I will now put my flame suit on
for possibly others :)

lb

ATOM RSS1 RSS2