CONFERENCE
TRIP REPORT

5th Annual Wilshire Meta-Data Conference
& 13th Annual DAMA Symposium
Hosted by Wilshire Conferences, Inc. & DAMA International
This
report Compiled and Edited by
Tony Shaw, Chairman, Wilshire Conferences
Contributing
Trip Report Authors:
A
debt of gratitude, and congratulations on a huge task exceptionally well done,
is owed by all conference attendees, to the work of the following individuals:
Linda Kresl, Margaret O’Hara, David Plotkin, Anne Marie Smith
The
Meta-Data Conference and DAMA Symposium were again co-located in 2001. The
combined event drew an audience of over 1000 attendees and speakers. The exhibit
floor included 35 companies showing the latest data management and development
products. To receive more information about this conference, and related future
events, go to http://www.wilshireconferences.com
This report contains summaries of the key discussions and conclusions from virtually all of the 60+ conference sessions, tutorials and workshops.
Reproduction
policy:
This Conference Summary is intended for the use of the attendees at the 2001
Wilshire Meta-Data Conference and DAMA International Symposium.
As such, attendees may excerpt or reproduce any portion of the report for
the purpose of sharing information with colleagues and within their own
organizations. Any other
reproduction, publication or editing of the report is not permitted without the
specific written authorization of Wilshire Conferences, Inc., including the
placement of the report on other web sites.
However, links to the report on the Wilshire Conferences web site may be
made without express permission.
©2001
Wilshire Conferences, Inc.
"Meta-Data Conference" and the Meta-Data Conference logo are service
marks of Wilshire Conferences, Inc.
Join
us next year…

The 6th Annual Wilshire Meta-Data Conference
and the 14th Annual DAMA International Symposium
April 28 – May 2, 2002
www.wilshireconferences.com
META-DATA
CONFERENCE & DAMA INTERNATIONAL SYMPOSIUM
March 4-8,
2001 – Anaheim, California
|
Sunday
March 4 - WORKSHOPS |
||||
|
Half Day |
W1: Data
Modeling Essentials: Things Have Changed Graeme
Simsion & Graham Witt, Simsion & Bowles |
W2: The
Operational Data Store: An Evolution of the Data Warehouse Jonathan
Geiger Braun
Consulting, Inc. |
W3:
Knowledge
For Action: The Discipline of Spreading Knowledge Robert
S. Seiner TDAN
& CIBER, Inc. |
W4: Peter
Aiken Institute
for Data Research Virginia
Commonwealth University |
|
Monday,
March 5 - TUTORIALS |
||||||||
|
Full Day |
T1
Zen
and the Art of Data Modeling Alec
Sharp Damex
Consulting |
T2
Applying
Quality Principles to Data Definition and Data ModelingLarry
P. English INFORMATION
IMPACT International, Inc. |
T3
Developing
a High-Quality Data Resource to Support Information Needs Michael
Brackett Data
Resource Design & Remodeling |
T4
John
Zachman Zachman
International |
T5
Building
and Managing the Meta Data Repository David
Marco Enterprise
Warehouse Solutions |
T6
Debbi
Walsh & Hal Davis XML
Solutions |
T7
Application
and Data Integration
Sridhar
Iyengar Unisys
Corporation |
T8
Data
Architectures for Scalable E-Commerce Michael
Stonebraker Cohera
Corporation |
| Tuesday,
March 6 – CONFERENCE SESSIONS |
||||||
|
8:45-9:45
|
KEYNOTE
PRESENTATION: The Agile Organization,
Tom DeMarco, Atlantic Systems Guild |
|||||
|
10:15 -11:15 |
C1 Pouring
the Foundation for the Information Age: Data Architecture at USAA Andres
Perez USAA |
C2 Data
Quality as a Profit Center Wendy
Wood SBC
Services |
C3 Introduction
to the Unified Modeling Language
Eric
Naiburg Rational
Software Corporation |
C4 Implementation/Use
of Operational Meta Data to Improve Data Quality in the Data Warehouse Michael
Jennings Hewitt
Associates |
C5 David
Hay Essential
Strategies |
C6 Alan
Perkins Visible
Systems Corporation |
|
11:25 - 12:25 |
Data
Management Support for Enterprise Architecture Brett
Champlin Allstate
Insurance Company |
C8 Business
Rule Specification, Validation and Transformation: Advanced Aspects
Terry
Halpin Microsoft
Corporation |
C9 Business
Processes and Logical Process Modeling
Anne
Marie Smith LaSalle
University |
C10 Redefining
Meta Data Strategy in the 21st Century
Ron
Klein Carswell
Thomson Professional Publishing |
C11 Build
Your Own Web-Based Meta Data Repository
Joseph
Newcum Bank
One |
The
Role of Data Administration in Managing an Enterprise Portal Arvind
Shah Performance
Development Corporation |
|
1:45 - 2:45
|
C13 Corporate
Data Architecture in a Federated World Deborah
Henderson Hydro
One Networks Inc. &
Vladimir Pantic, IBM Global Services |
C14 Facilitation
and the Successful Architect
Shelley
Lieberman Mathtech |
C15 The
Practical Use of a Universal Data Model in the Data Warehouse David
Lepley Tyco
Electronics |
C16 Understanding
and Managing Reference Data
Malcolm
Chisholm Deloitte
& Touche |
C17 Architecting
and Implementing a Web-Based Corporate Meta Data Repository (CMR) at the
Census Bureau
Gail
Wright Oracle
Corporation |
C18 Building
the XML Meta Data Repository David
Plotkin Longs
Drugs
|
|
3:15 - 4:15
|
C19 Jill
Dyche Baseline
Consulting Group |
C20 Elevating
the Role of Information Resource Management for Business Effectiveness Larry
P. English INFORMATION
IMPACT International, Inc. |
C21 PANEL:
Comparison of Modeling Techniques Graham
Witt Alec
Sharp Terry
Halpin Eric
Naiburg |
C22 Meta
Data - Myth and Realities
John
Ladley Knowledge
InterSpace, Inc. |
C23 The
UPS Meta Data Repository - A Success Story: Taking the Next Steps Patti
Munier United
Parcel Service |
C24 Universal
Data Models for Web Information Management Len
Silverston Universal
Data Models |
|
Thursday,
March 8 – CONFERENCE SESSIONS |
||||||
|
8:30 – 9:30
|
Enterprise
Information Architecture: "Starter Kit" Models
Jane
Carbone DATANOMICS,
Inc. |
C50 Michael
Gorman Whitemarsh
Information
Systems |
C51 Conceptual
Data Modeling in an Object-Oriented Process
Scot
Becker InConcept,
Inc. |
C52 A
Success Story: Enterprise Customer Meta Data Definition/Implementation Barbara
Peterson Agilent
Technologies |
C53 Warren Selkow Consultant |
C54 Patricia
Klauer, & Robert
Cooley, Apex
Solutions, Inc |
|
9:50 - 10:50 |
C55 Action
Business Rules – Getting to Yes Judi
Reeder Consultant |
C56 Natalie
Arsenault First
Union National Bank |
C57 Dave
Buch, Capital
One |
C58 Joe
Danielewicz Motorola,
SPS |
C59 PANEL:
New Trends in Meta Data
Robert
Seiner, TDAN and CIBER (moderator) Don
Soulsby Computer
Associates James
Jonas, Oracle |
C60 OMG
CWM - An Architecture for Enterprisewide E-Business Intelligence
Integration Sridhar
Iyengar Unisys |
|
11:10-12:30 |
CLOSING
KEYNOTE PANEL: Data Management – Where to From Here? |
|||||
SUNDAY
WORKSHOPS
|
Workshop 1 |
Speakers: |
|
Things
Have Changed |
Graeme Simsion and Graham Witt Simsion
Bowles & Associates |
Summary by Carey Clark
In an industry known for its
hype and self importance, the two Grahams sparkle for their self deprecating
honesty. They speak their mind and welcome rebuttal. Controversy is a good thing
and not to be avoided.
What’s different now than when his first book was
published?
Object Orientation has not made
conventional data modeling obsolete. It’s great for some software projects but
can create more headaches than help for “persistent data” applications.
Typical pitfalls:
UML is not their preferred
modeling nomenclature for a number of reasons:
Data Types are now more complex and include more
user-defined data. Spatial, video, audio and image data deserve their own type.
One must resist the habit of converting data into characters or numbers when a
richer data type makes sense. For example address can be its own data
type and treated as a single thing.
Derived data needs to be modeled and defined even when
they won’t be stored in a database.
Business rules need to be captured. How they are stored
depends on their volatility. All such rules are subject to challenge. It is the
modeler’s responsibility to suggest changing existing rules when they don’t
make data modeling or business sense.
In some circumstances one must allow a rule to be broken.
The problem arises when there is a need to collect data that doesn’t comply
with a rule.
Naming data is extremely important. When names don’t
denote what they mean the ambiguity becomes widespread. This is even truer with
XML and the increased interaction with other businesses. When there is an
industry naming standard (i.e. XML) it is best to go with it. Even if not
optimum, it beats having to translate more than necessary.
Data modelers need to be involved with the use of their
models after they are completed. It is not uncommon to see developers ignore
them or misuse them. The effort of many hours of confirmation can be jettisoned
when a developer assumes a mistake and simply overrules the model.
Meta data needs to be available to everyone who needs to
see it: Users, process modelers, and developers. If its not used it doesn’t
add value. “You can’t have data quality without meta data quality.”
In one survey they found that a good percentage of
decisions the data modeler felt was theirs to make, the data administrators
thought they should make. It behooves the two functions to reach agreement on
responsibilities.
Although there wasn’t time to go into detail they
touched on how to present large data models (i.e. corporate data models) to
executives. The consensus there was to break the model up into small chunks. The
whole model tends to bewilder the uninitiated.
They presented a process diagram of the data modeling
process. It was realistic and useful.
|
Workshop 2 |
Speaker |
|
The Operational Data Store: An Evolution of the Data Warehouse |
Jonathan Geiger, Vice President, Braun Consulting,
Inc. |
Summary by Dale Kohlmoos
Jon Geiger gave a three-hour presentation on an
operational data store (ODS) designed for the tactical analysis of
subject-oriented data. Jon mapped out the essential steps that should be taken
in the development of an ODS in an integrated enterprise system.
He began with a high level evaluation of enterprise
systems and described where the ODS could be positioned for optimal use as a
tactical tool for analysis. The ODS was presented as a tool used to complement a
warehouse and its associated data marts. The intent of the ODS is the tactical
execution of the strategies identified in the warehouse. In order to accomplish
this, he described the ODS as demanding a high degree of query performance and
availability.
Characteristics of the ODS are that it is
subject-oriented, integrated, current and volatile. It is intended to be a
central point of data integration for business management. This view was further
broken down into four classes. Each class was described by the update frequency,
degree of integration, transformation and summarization.
ETL tools play a significant role in the management of the
ODS and were described to be an architectural consideration. The high level of
integration, transformation and summarization preclude most other forms of
loading.
Jon introduced the concept of Oper-Marts or ODS Data
Marts. Much like the familiar OLAP reporting cubes, summary tables and small
star schemas. The Oper-Marts being frequently rebuilt because they only reflect
data at a specific point in time and lag behind the ODS data update.
ODS data model examples and aspects of tuning and
scheduling were presented to help give the audience a good background for
consideration of an ODS implementation. From there Jon went into overall
architectural considerations with respect to e-Business, CRM, Finance and
Insurance.
The methodology for implementation included examples from
project management, design phases, project phases, project definition, process
definition, process modeling, deployment and all the associated deliverables.
Good examples were given to demonstrate what needs to be considered to drive a
successful implementation of an ODS.
Last but-not-least, Jon reviewed data quality issues and
expectations for an ODS. Much the same as what is seen throughout the
enterprise, but with suggestions for when and where those issues may be caught
and cleaned up. That was with a look at the impact on the tactical analysis
performed on ODS data.
|
Workshop 3 |
Speaker |
|
Creating Competitive Advantage through Knowledge Management |
Robert Seiner, Publisher, TDAN and BI/DW Director & Principal, CIBER, Inc. |
Summary by Margaret O’Hara
Using the example of a grocery chain opening a new store,
Bob Seiner stepped the audience through the process of creating competitive
advantage through knowledge management (KM). After noting that this was the
first workshop on Knowledge Management presented at a Meta-Data/DAMA-I
conference, Seiner state that there was a logical progression from managing data
to managing info to managing knowledge.
Seiner first defined KM as the discipline of spreading the
knowledge of individuals and groups across the organization in ways that
directly affect performance. He emphasized that the spread of knowledge
was critical, as knowledge cannot be helpful unless it is shared. The vision on
KM is that the right information – in the correct format – gets to the right
person, at the right time, for the right business purpose.
The amount of information being produced annually is
approximately 250 MB per every person on the planet – and that this amount is
expected to increase. Thus, managing the knowledge is a daunting task for all
organizations.
To set the stage, Seiner offered the first of his many
store-opening examples. As part of a KM project, he interviewed employees from
two recently opened stores in the grocery chain. Employees in Store #1 reported
significant problems with one aspect of the opening – receiving deliveries.
Store managers solved the problem and learned to manage its deliveries. When
interviewing employees in Store #2, he discovered that they had experienced the
same problem, but because the first store had not shared its knowledge, Store #2
went through the same painful process of solving the problem.
The first business impact of Knowledge for Action is that
information is provided 24x7 in a customizable and detailed view to everyone who
needs it. Thus, knowledge is recorded (i.e., becomes an artifact). Moreover, the
knowledge is well managed and employees learn from past decisions. There exists
a sharing of best practices and innovation. Most importantly, a KM program
reduces the risk from attrition. To sell KM initiatives to senior management,
Seiner recommends you start small and focus on investment rather than costs
(i.e., on the payback of the project).
KM project planning should start with executive business
sponsorship and should involve a knowledge audit. Audits employ both qualitative
and quantitative assessments as well as a readiness assessment. Scoping sessions
are important: Seiner recommends starting with “a slice of a slice of the
pie”, and identifying the “most ready” of all documented knowledge.
Questions to ask within the organization during the audit include:
Seiner also stressed that changing behavior was important
to the process and suggested that accountability for knowledge become part of
peoples’ jobs – in most cases being written into the job descriptions.
To develop the Enterprise Knowledge Platform, Seiner
suggests careful assessments, performing the knowledge audit, planning for the
short, intermediate and long-term, creating the employee portal, and making the
standard build vs. buy decision.
Although time ran short in the presentation due to the
number of questions and comments from the very involved audience, Seiner did
have time to stress that the knowledge portal was not the only consideration in
KM. While the portal’s functional design, graphic capabilities and degree of
personalization were important, the portal is only a starting point web site
where employees can enter, find and access knowledge.
|
Workshop
4 |
Speaker |
|
|
Peter Aiken, Institute for Data Research, Virginia Commonwealth University |
Summary
by Linda Kresl
Peter is a proponent of meta data management. He began the
presentation by pointing out that meta data isn’t a very accurate term. Many
IT and business managers don’t understand the importance of meta data. Many
managers ask why do meta data? Meta data is one of the most important activities
within Data Resource Management. Another definition for meta data is data
resource data. Meta data is data describing business processes both
technical and business related.
Deriving a legacy architecture is a major reason to create
meta data. Every system has architecture however poor or rich. Meta data is the
language of the architecture, it is how we understand and articulate the
architecture. Meta data describing system data can be considered as a
multidimensional data. A lack of meta data is the primary reason for
re-engineering failures.
A data model is an excellent place to begin the process of
meta data creation and definition. A model depicts the data implementation, data
design or system data requirements. Meta data engineering and data
re-engineering are inextricably linked. What is a meta data data model? A data
model that describes or characterizes system components, not business data.
Tools that reverse engineer meta data: SAS, Evoke (best used if the organization
doesn’t have a data model). These tools have a built in QA function.
Meta data Engineering:
As-Is
Data implementation assets
- Reconstitute data
design
- Recreate data
implementation
As-is
data design assets
- Recreate data
design
As-Is
information requirements assets
- Reconstitute
requirements
- Recreate
requirements
-
Redevelop requirements
To-be
requirements assets
To-be
data implementation assets
To-be
design assets
Redesign
data
A meta data model is the key to quickly implement data
conversions, understanding business processes, and gaining knowledge about
packages (PeopleSoft, SAP, etc.)
TUTORIALS
|
Tutorial 1 |
Speaker |
|
Zen and the Art of Data Modeling |
Alec Sharp, Founder, Damex Consulting |
Summary
by Arnie Hook
Alec teaches the outline and guidelines for good practices
to arrive at the data model that satisfies business needs. He imbeds humor to
establish a point and keep the audience involved with his inspirational
messages. The analyst must be able and willing to do a variety of things in
order to arrive at the appropriate data model.
Alec’s Messages: Design the content to fit your needs.
Extend and communicate the use of data management. Communicate across the
business and the objectives of the design practices. Reverse engineering to the
blank page. What is the direction?
Level set –agree on the basics. Consistency is key to
success. There are many ways to describe a business. What the business needs
information about: the data model. The data model is a non-technical description
of the business not a database. The model must be maintained at all levels.
Level set to the 3 types of data model:
Do not violate the four ‘Ds’ of modeling:
Alec describes the ‘facilitated session’ process to
analyze the business requirement. The technique ensures consistency and scope to
the objectives. Make an agenda and schedule for each subject session.
Participants need to understand their roles and responsibilities (‘establish
the behavior contract’) for each session. Alec coaches a ‘bus tour’ recipe
to facilitate for a correct model.
The last step is to review with ‘rhetorical context’.
Know the audience, occasion, and purpose. Then answer the data questions with a
storyboard format.
Alec takes the attendees through the course to practice
modeling principles and techniques for each level of analysis. The tutorial
presented a great workshop for the novice or expert. Even if you know it all,
this tutorial should be on your list.
|
Tutorial 2 |
Speaker |
|
Applying Quality Principles to Data Definition and Data Modeling |
Larry English, President, INFORMATION IMPACT International |
Summary
by Margaret O’Hara
The premise of English’s presentation was that since
information is the product of a process, Demmings’ quality principles can be
applied to develop Information Quality. English defines information quality as
“consistently meeting knowledge worker and end-customer expectations through
information and information services”. This involves quality of data
definitions, data content and data presentation. English offered the following
as an example of poor data quality:
Data Element:
Payment Date
Definition:
Date of Payment
As in this example, very often the stated definitions for
data elements are too vague to be of much use to the organization. Does this
date refer to the date the check received, the date it was written, the date the
monies were credited, or the date the transaction was entered?
The benefits of information are that work processes are
transformed and that clerical workers are “informated” (i.e., they become
knowledge workers). All too often, knowledge workers either use data for
something other than it was defined or have no idea that anyone else in the
organization is using the same data. An IQ initiative can help avoid these
problems.
English proposes that we eliminate the word “user”
from our vocabulary and instead describe those employees who use information in
their jobs as
English set forth several quality principles. These
involve a customer focus, process improvement, scientific methods and management
accountability. Most organizations do not hold managers responsible for the
information their departments generate. English spent considerable time
explaining Kaizen (the art of continuous improvement) and its application to the
Information Resource Management area.
Principle #1: Create a constancy of purpose for
improvement of the information product and service. Since the obligation to the
customer never ceases, information quality ramifications are that the IRM
mission and objectives are defined to include total quality for both its
services and products, develop plans with both long and short term deliverables
that support strategic business objectives.
Principle #2: Adopt the new philosophy of Quality
Information Management that will transform both the business and IS management.
The quality information philosophy means reliable information management and
shared information to reduce costs.
English next focused on how to assure data definition
quality. He believes that instead of data documentation we should engage in data
definitions that would state precisely the meaning of words. He stressed that
the definition should not be more difficult to understand that the word it
defines. English also feels that we should avoid the term “meta data” except
in technical forums. The Knowledge Worker (not the user!) will better understand
the phrase Information Product Specification (IPC). An IPC is a detailed, exact
statement of particulars. Among the goals of data definition are (1) to enhance
communication assuring that the transmitted information, thoughts and feelings
so that it is satisfactorily received and understood and (2) to increase
productivity.
English then presented the concept of Total Quality data
Management (TQdM), which will establish the Information Quality Environment. He
proposes that TQdM is not a program but instead a value system and habit of
continuous improvement of both application and data development processes and
business processes. English illustrated the TQdM process using a data flow
diagram. The steps in establishing the IQ environment are:
|
Process |
Output |
|
Assess
the data definition & IQ architecture quality |
Data
definition quality assessment |
|
Assess
Information Quality |
Information
Quality Assessment |
|
Measure
Non-Quality Information Costs |
Information
Value / Cost Analysis |
|
Reengineer
and Cleanse Data |
Corrected
Data |
|
Improve
Information Process Quality |
Information
Process Improvements |
English discussed data definition quality characteristics
such as conformance to meaningful enterprise standards, consistency of data
names, and complete domain values with definition. He also stressed the
importance of data standards quality, including such issues as enterprise wide
guidelines, meaningful abbreviations and complete, precise, non-overlapping
class words. English illustrated the importance of determining all definitions
of a word with the business term “volume”. He presented three diverse
definitions of the word, each used by a different business segment.
After giving several examples of data definitions and
business rules that illustrated high and low quality, English had the audience
assess a specific attribute definition using a Data Definition Quality /
Usefulness Assessment Form. Working In small groups, the attendees assessed one
attribute definition. This brief exercise generated much discussion, which
demonstrates the complexity in achieving even one small part of information
quality.
English then presented the basics of Information
Architecture (IA) quality and suggested guidelines for achieving a high quality
architecture. Such architectures are characterized by completeness, stability,
and flexibility. Moreover, these architectures can be reused with a minimal
degree of modification. “A well-defined architecture supports tomorrow’s
business needs as well as today’s”.
English then described the TQdM process #5: Improving
Information Process Quality by presenting the Quality, Time, Money triangle.
Essentially, maximizing any one of the three points means the other two will
suffer. Typically, an organization can achieve two, but not three of the
objectives.
Toward the end of the day, English provided metrics to
measure information quality, stating that choosing the lowest price alternative
may result in the costliest action. He believes that organizations – instead
of asking for a cost/benefit analysis of “shared” DBs and enterprise data
modeling -- should ask what the cost is of redundant applications as well as the
cost of change requests to the original product specifications. He reminded the
audience once again that Total Quality data Management is not a program; it is a
value system, mind set and habit of continuous improvement.
|
Tutorial 3 |
Speaker |
|
Developing a High-Quality Data Resource to Support
Information Needs |
Michael Brackett,
Consulting Data Architect, Data Resource Design &
Remodeling |
Summary by Dale Kohlmoos
Michael Brackett gave a full day presentation that
addressed a lot of the commonly experienced limitations of our current data
resources. He discussed how we can turn those limitations around for more
refined data resources that could better meet information demands.
He reviewed and discussed current data situations, data
resource concepts, resolving data disparity and cultural considerations. The
current data situation is that disparate data is a truism. The result of this
disparate data is the inability to integrate data to meet the information
demand. He described four basic data problems that are commonly seen throughout
most organizations:
The demand for integrated data to support business needs
is high, yet disparate data continues to be produced at a rapidly increasing
pace. Mr. Brackett described the current status quo as potentially leading the
organization to failure due to information deprivation. An emphasis was placed
on the notion that it’s not our tools that understand technology, nor do they
automate understanding, but that people are the key and tools support people.
Mr. Bracket discussed the structure of the Business
Intelligence Value Chain and noted that the data resource is the foundation of
all the other structures. This is a sobering reminder that we all need to
revisit every so often. Mr. Brackett also brought to mind the debate on whether
the data resource is considered an asset or a resource.
Further discussion reviewed data architecture and the
corresponding position of the data resource within that architecture. From
within that architectural perspective, Mr. Brackett identified ways and means to
both halt and resolve existing data disparity. From there, the session delved
into detailed examples, principles, and practices for refining data definitions,
data structures, data integrity, data documentation, data orientation, data
availability, data responsibility, data vision, and data recognition.
The next step was to discuss the data resource transition
and how to implement better practices. Not to mention, the cultural
considerations that would have to be addressed to make it happen.
Mr. Brackett concluded his presentation by demonstrating
that there is no “silver-bullet.” The techniques are available and that it
is time to develop a high-quality data resource that can meet the information
demands of each organization.
|
Tutorial 4 |
Speaker |
|
|
John Zachman, President, Zachman International |
No
summary is available for this tutorial
|
Tutorial 5 |
Speaker |
|
Building and Managing the Meta Data Repository |
David Marco, President, Enterprise Warehouse Solutions |
Summary by Carey Clark
David Marco’s presentation
was aimed at those new to the meta data imperative and included sections on
basic meta data terms, definitions, concepts and justifications. He also makes
the case for treating repository creation as a project and to use formal project
management methods. The presentation is drawn from David’s book by the same
title.
It is important to relate and
document the business benefit of the repository. This benefit is usually to
increase revenue or reduce costs. Repositories need to be built iteratively with
value added at intermediate stages.
David likes to put data
quality in the repository rather than in the data warehouse because more people
can get to it and can be related to more systems.
He estimates that 35% of the
IT budgets are spent on integration. His experience is that a company’s data
will double every 4 years. Hence the need to manage this data is critical to
effective growth.
He
separated meta data into business related and technical related areas. Most of
what one audience needs to see, the other audience doesn’t.
A lot of his projects are
aimed at the data warehouse construction. They deal mostly with extraction,
translation, load (ETL) activities rather than business names, definitions and
their maintenance.
His list of MUSTS includes:
David uses a classic decision
matrix for determining the best tool. Each requirement has an importance, a
complexity (=cost). Each tool is then matched against this matrix.
|
Tutorial 6 |
Speaker |
|
|
Debbi Walsh, Technical Director, & Hal Davis, Consultant, XMLSolutions |
Summary by David
Plotkin
Introduction and Business Case
The tutorial began with a brief introduction of what XML
is, including an intuitive diagramming technique for showing how XML labels data
– giving it more meaning and making it more understandable than a simple flat
file. The design goals of XML were reviewed, giving us a good idea of the
reasons why we might want to introduce XML into our organizations. As part of
this justification, a series of business scenarios were presented, and in each
case the advantages that XML provides were made clear.
Documents and Structure
The tutorial continued with the definition of the rules
for creating a "well-formed" XML document, including the single root
element, proper element nesting, quoting of attribute values, and the naming
conventions.
Validation of an XML document can take place – either
via the well-accepted DTDs, or the newer, and more powerful XML Schemas. The
syntax for defining DTDs was discussed, including the details of processing
instructions, the XML declaration, elements, attributes, and comments. The
different types of elements were covered, such as text, empty, mixed, and
element (a content model that consists of sub elements). The different types of
attributes were also covered, as well how to declare optionality and
cardinality. Namespaces (for reusing element names) were covered with examples.
XML Schemas were discussed in significant detail, including simple and complex
data types, and declaring your own data types. In addition, the reuse aspects of
XML Schemas (one of the primary advantages of XML Schemas) were shown.
After discussing how to build validation documents, the
details of connecting a DTD to an XML document for use by a validating parser
was covered. In addition, general and parameter entities (both internal and
external) were covered with an excellent and concise chart.
RDF
The presenters covered RDF, although it was somewhat
difficult to see the application of RDF in the context of XML. There are some
similarities, but not strong ones.
Transformations
One of the most useful parts of the whole presentation was
the section on transformations. Using XSLT (.xsl), Hal put on a demonstration of
displaying an XML document using a style sheet in XML, and showed how the entire
"look" of the document could be changed by changing the associated
style sheet. He also demonstrated how the XML document could be converted into
another form – be it another XML document, a plain text file, or whatever. The
presentation covered the exact flow of how the XML content was converted,
including using a parser, and even included a brief rundown of some of the more
common XSLT commands. He also covered XSL (.fo) for applying formatting to
convert the output of XSLT to PDF, HTML, or printer output.
The parser uses either DOM or SAX, and Hal covered the
advantages and disadvantages of both types of parser. DOM needs more memory and
is not as quick as SAX, but since it maintains the "tree" in memory,
it is possible for the program using the parser output to navigate the nodes of
the tree more freely.
Resources
XML has a considerable number of resources available –
standards, products, and information on the internet. Hal and Debbi briefly
covered these topics, and provided a CD that contained all of the XML standards
being considered. They were less thorough with the editors, databases,
transformation tools, and servers that are available today, merely stating that
there were plenty of choices.
Data Management/Schema Design
The last two sections briefly covered two topics of
considerable importance to Data Administrators getting involved with XML. The
first are the challenges that we face in managing these new flavors of schema,
and this whole new environment. They provided some recommendations on managing
names, accuracy and descriptiveness, and modularity and reuse. There ARE
industry standards emerging, and where possible, it is a good idea to try and
use the common schemas for an industry. Finally, they covered what you should be
concerned with when trying to manage your schemas centrally, including the
ability to browse, do impact analysis, impose good design practices, dynamically
access and generate schemas, and import and export schemas from various sources.
Summary
by Ron Klein
Sridhar’s insights and
knowledge contributed greatly to our awareness of what is coming in the
standards area. His tutorial presentation included discussion of various
evolving OMG (Object Management Group) standards, models and protocols, such as:
CWM – Common Warehouse
Metamodel
UML – Unified Modeling
Lamguage
XMI – XML Meta data
Interchange
MOF – Meta Object Facility
Much of the discussion was
driven by questions from the audience, hence this summary draws substantially on
those questions.
Quick history of OMG:
founded 1989, now more than 800 vendors.
1991 - CORBA 1.0
1995 - CORBA 2.0
1997 - MOF and UML
1999 - XMI and CORBA Components
2000 - CWM, XML.Value, EDOC (Enterprise Distributed Computing), XMI for XML
Schema
2001 - UML for EDOC, UML 2.0, Better XML and E-Business integration
OMG is broadening the scope of
technologies moving to Model Driven Architecture. It is targeting middleware
technologies in the data management and application development realms.
The Meta Data Coalition (MDC)
merged into OMG during 2000. CWM became the common standard last June (2000) and
had a revision published last week (Feb 26, 2001) based on vendor experiences.
The Data Integration Problem
-
Emerging XML issues
include new XML data types, integrating XML with middleware technologies and
into core database technologies.
-
The Internet is
driving us from small to large databases.
-
The transformation
of information from one technology to another leads us to CWM as a solution.
What is needed to solving the
Integration Problem?
-
Meta data becomes
more and more important.
-
Moving to XML APIs.
-
New APIs such as
JMI, JOLAP, JDATAMINING
-
SOAP Developmenter:
marries HTTP/XML
E-‘Muddleware’
Architect’s Dilemma
-
What is the data
exchange protocol?
-
Ignore the
middleware when you are doing Design and Analysis, use Mapping techniques.
-
Integration at
higher level is as important as in lower levels.
-
XML won’t solve all the problems! Others will not go away.
AUDIENCE
QUESTION (Q): What is your definition of components?
INSTRUCTOR ANSWER (A): Pieces of a program with interfaces that have been
captured.
SPE – Software Process
Engineering: Best practices forming Objects for life cycle – an extension of
UML. (IBM, Rational Rose and others)
Q: What about Open Process?
A: Not involved with SPE but it is with UML 2.0.
Q: Data Structure?
A: There is no model that fits it all! UML can define what the data structures
are. It addresses the static part of it. If you can represent your legacy in UML
then you can use XML.
Q: What about Workflow
Management?
A: Activity diagram is included in UML, State Machines.
Q: What about Batch File Model?
A: Look at CWM model, it is more focused on extraction and transformation. UML
is weak here.
Model the Data,
Model the Application, and Model the Interface
Every three years comes a new
protocol. The guts of business rules change very slowly because they are
abstract concepts of the business. It is fundamental to focus in your business.
Enterprise Portals are in a
rudimentary stage now. The elements are already in CWM. Integration technology
brings process, content, application. It is not Data or Process or Presentation
integration but all of them.
Work
together with common shared metamodels
-
There is more & more meta data lurking everywhere!
- There are specific meta data
to manage the DW in CWM. More clear, more easy to use and represent meta data.
IDA
– Enterprise Modeling from OMG
OMG Modeling and Meta data
Framework
Modeling
Concepts:
- Platform
Independent Model (PIM)
- Platform Specific Model (PSM)
Meta data technology
- Mappings from Independent
Model to Platform Specific Model
1)
Create concrete mapping from neutral to specific through data model and
rules
2)
UML profiles: AD going from neutral to specific (UML-> C++, Smalltalk,
JAVA)
Q: Which one is the META META
model?
A: MOF Meta Meta Model, it is a subset of UML.
Q: What about legacy ER with
UML -> Use Case?
A: When you deal with data a bridge is needed. Work is on going to map UML and
ER. CA, Rational, Sybase are supporting. You need to make decisions to map
models. There is a mismatch. CWM includes UML, ER and Transformation Model. The
heart is MOF.
Q:
What tool are you using to generate XMI and IDL?
A: Rational Rose
Roles of UML in CWM
CWM 1.0 Overview {02/2001}
Common Warehouse metamodel
Q: Where do I
see security?
A: It is part of the systems management.
Q: Is this the persistent
metamodel?
A: Yes
Q: Notation, classes becoming
stereotype in UML?
A: Yes.
|
Tutorial
8 |
Speaker |
|
Data Architectures for Scalable E-Commerce |
Michael Stonebraker, Chief Technology Officer, Cohera Corporation |
Summary by Linda Kresl
In this full-day tutorial Dr. Stonebraker predicts that
the US will lead B2B eCommerce. Major B2B players are Ariba, CommerceOne,
Oracle, SAP, IBM. He covered data architecture designed for eCommerce, B2C/B;
its inception, types of products (Portals, DBMS, protocols, components, N-tier
architectures) and the standards associated with eCommerce.
A B2C application example is a query catalog of items for
sale. B2C players are Broadvision, and Openmarket. The interface is usually to a
fulfillment system. Gizmos like Palm Pilots and cell phones will be major
players in the future.
Any web architecture should be designed using components.
The component protocols should be built using Java beans – a safe bet for
general-purpose applications. Don’t build your components in Active X, it is
not supported by any non-MS OS.
Another choice is XML as a component protocol. XML is also
a messaging system – XML will soon be ubiquitous even on gizmos. XML goes
through firewalls and it’s easy to parse. XML is a safe bet for low
performance applications, use it only for
small and slow applications. XML
isn’t a good idea for large amounts of data because the meta data is coupled
with the data. He favors Java for a web language. C++ will be used for complex
applications. He favors the following scripting languages, Javascript, XSL.
These products are ODBC compliant and talk to the DBMS. Michael suggests that
you stay with ODBC to move from database to database.
Components can run in 3 areas:
1.
Thick client – on a browser – screen intensive logic should run as
close to the screen as possible
2.
Thick middle – applications that are in between should run in
middleware
3.
Thick database an OR DBMS - data mining should be run as close to the
database as possible. Logic in the DB is always faster!!! Move the code to the
engine!
Michael states that the obvious goal is Universal
components. Write the component once and reuse it at any level. The industry is
nowhere near universal components. Java Beans are the closest component at this
point in the game.
How should we interface to legacy systems? We can use two
approaches, an EAI system or a messaging system. Please use your favorite EAI
system. An EAI helps you package up a message and transform it over the network
and have the user unpack it and understand it. The top EAI packages are: MQ
Series (IBM), Webmethods, Vitria, CommercQuest, CrossWorlds, Mercator.
Content Management is locally authored information in rich
content (text and images) and little if any structure to this data. This data is
fairly static. This data may also be purchased. There are two solutions to
manage content management.
1.
Store content in HTML/XML via a file system (don’t grow your own)
Packages are Plumtree, Viador,
Interwoven, Vignette
2.
Object-Relational DBMS – use this if you have an enormous amount of
content, these are scalable.
The Web changes data warehousing with a new set of data
– clickstream analysis (CSA) – every time a user clicks to a new page –
this is stored. This data source is outside the enterprise. Now, this data is
outside the firewall. CSA looks exactly like traditional data warehousing. Web
site scraping is used to get data from web sites. This is a way to get the data
if the enterprise doesn’t own the data. One of the weaknesses of DW is that
data is stale by ½ the refresh interval, the scalability issue is great. Trends
in this space include automatic data mining, federators should get traction, and
visualization systems will get traction to complement data cubes.
Michael suggests the following to improve web design.
1.
Plan for short design cycles - web cycle time appears about 3 months and
the rapid prototyping mentality is really required.
2.
Scalability is key. Test a design for scale before it goes live. Make
sure that you hire serious system software expertise. Availability is a must.
Replicate your data and make sure to turn RAID on.
3.
Do only what you are good at. Figure out your core competency and
out-source everything else.
4.
Do everything only once. This means run one ETL system, one EAI, one
Federator, etc.
5.
Less islands of information. Use less system administrators, less
training, less manuals, etc. Converge federator and EAI and converge app server
into OR DBMS
6.
Use XML appropriately – use as a transport protocol not a storage
format
CONFERENCE
SESSIONS
|
KEYNOTE |
Speaker |
|
|
Tom DeMarco, Principal, Atlantic Systems Guild |
Summary by Linda
Kresl
Tom started kicked off the Meta-Data DAMA conference with
a flair. He said the systems we build today
are characterized by: more stakeholders, conflict, shorter schedules, tighter
budgets, more visibility, and risk. And modern day systems are harder because we
built all the easy ones years ago.
The
major point that Tom is making in this presentation is that we need to introduce
“slack” in our work environment. His definition of slack is the degree of
freedom (in time and budget and manpower and space, etc.) necessary to make
change possible.
What is a quality focus today? Most of our quality
programs focus on defects. How do we live with the fact that many of our
products are chock full of defects. For example, Microsoft’s IE. Does the
software transform your world? The fact that it has defects is of no
consequence, it transforms the way a person does one particular thing.
We must consider human capital as the most important asset
of an agile organization. The agility principle is based on prioritization.
Tom’s view on priority is a great departure from the norm. He suggests rank
order priorities and putting projects on hold when their priority doesn’t
justify doing them yet.
Tom’s Prescription for a new era
·
Become less “efficient”
·
Lighten process (strive for light process and heavy skills)
·
Learn to Prioritize
·
Choose your projects very wisely; what you decide not to build is
more important than how you build
·
Invest in human capital
People must spend time thinking today. Tom spent one whole
summer just thinking. Don’t spend all your efforts strategizing. Everyone
should put some slack back into your life. Put some slack back into your
organization.
|
Conference Session |
Presenter |
|
|
Andres Perez, Enterprise Data Architect, USAA |
Summary
by Linda Kresl
Andres is chartered with bringing more rigor to USAA’s
data architecture. USAA is an automobile insurance agency that prides itself in
serving its member with superior information. Andres hopes that what he shares
today will be something that you can take home with you and use in your own
organizations.
He discussed the fact that they have a large IMS legacy
system. It is extremely difficult to do data mining with data in this format.
Much of the data isn’t defined correctly and it is conflicting. Semantic
problems are those in which data attributes don’t match up from different
reports. Also the data is constrained to a given channel. The web may help
alleviate this problem.
Most of the applications at USAA have more interfaces than
users. One application alone has 4,500 interfaces. There are several
translations that must take place for any single application to run. This has
created a fur ball of data! Andres states that 50% of the total IT budget is
spent maintaining the interfaces.
The single reason that data is inconsistent is what the
individuals believe their business processes are. Every individual truly
believes they are doing the right thing. When in reality they are not doing what
is best for the business. Because of the focus on projects and not the
enterprise – USAA has redundant data.
USAA’s data architecture is based on the Zachman
Framework. USAA still has many obstacles to understand its customer’s needs.
By implementing the Zachman framework they hope to understand and relate
relevant data. Andres is proposing a common data model and definition. He is
proposes a reference guide to manage and control meta data.
The desired data architecture for USAA is creating data
structures that are subject oriented and in canonical form. Once the data is
moved to subject areas Andres proposes creating data marts based on these
subject areas.
|
Conference Session |
Speaker |
|
Data Quality as a Profit Center |
Wendy Wood, Data Quality Analyst, SBC Services |
Summary
by Margaret O’Hara
Wood began her presentation with a comment about “dirty
data” being a renewable resource, and thus offering her job security. She then
explained the mission of her department at her firm: data quality. She discussed
how high data quality can help the company achieve its goals of faster and
better market response, improved business flowthrough and customer delight.
Wood believes that the main questions to ask in your
company are: (1) Are you getting the data you’re expecting, and (2) what is it
worth to you and your company? Wood believes that customer addresses are a good
place to start a data quality initiative because most firms have address data,
many areas of the company have problems with the address data. At PacBell,
customer addresses are a major issue. This is because a single customer may have
up to three addresses: the service address, the billing address, and the listing
address. While the company can handle “less-than-perfect” addresses for
service (e.g., the second house behind the gas station on the corner), the Post
Office cannot. More importantly, discounts available for complete addresses were
threatened.
To correct the problem, Wood found users who cared about
the data. She stressed that data quality was not something that IT could achieve
by itself, a user-sponsor was critical. She urged the audience not to take such
projects on themselves – to be sure there is buy-in from the business. She
then briefly stepped the audience through the cleansing process. First, take the
highest level “one” table – country is a good example as it has few
values. Examine and correct the data in that table, then move down to state,
then to city, etc. She cautioned that the lowest level tables are the ones with
data quality issues that are the hardest to identify. She also advised looking
at small samples (perhaps 10% of the data before undertaking the project). At
the very least such an examination will allow the firm to learn more about its
data.
|
Conference Session |
Speaker |
|
Introduction to the Unified Modeling Language |
Eric Naiburg, Rational Software |
Summary
by Anne Marie Smith
Eric Naiburg, (presenting for Terry Quatrani who was
unable to attend due to weather), introduced the concepts of the Unified
Modeling Language, how it can be used, and some examples of UML in modeling.
History of UML: created by Booch, Rumbaugh and Jacobsen
– all were working on methodologies / languages for visualizing, specifying,
constructing and documenting the artifacts of a software system. These
methodologies were synthesized with the assistance of Rational Software Corp.,
and has evolved into a unified format, notation and language designed for
modeling applications and data.
Eric explained the various diagrams in the UML:
Activity Diagrams: show flow of
control in a system, from start to finish. It represents processes (activities)
and the order in which each occurs. This activity diagram can be used to
illustrate the data entities needed, as the basis for database design.
Use Case Diagrams: Use cases
and actors are the 2 components of a use case diagram. An actor is someone or
something that must interact with the system to perform an action. A Use Case is
a pattern of behavior that the system can exhibit. Each use case is a sequence
of related transactions performed for an activity, involving one or more than
one actor. Use cases are a high level requirements gathering and documentation
method, and are essential to an object-oriented system development.
Sequence Diagrams: Displays
object interactions in the order in which it will be performed.
Collaboration Diagrams:
Displays object interactions organized around objects and their links to one
another.
Class Diagrams: Shows the
existence of a class and its relationships in the logical view of a system.
Classes are collections of objects with a common structure, common behavior and
common relationships. Eric explained the concepts of association, aggregation,
dependency and inheritance relationships in classes. Eric mentioned the
similarity between entities and classes, to demonstrate the commonality between
ER modeling and UML modeling. He showed the essential nature of “classes” in
object-orientation, and the modeling of classes and relationships within the
UML.
State Diagrams: Shows the life history of an application,
and are similar to an activity diagram at a point in time. This diagram type is
not used as frequently as activity diagrams or sequence diagram for application
development. They are more frequently used for networking implementation.
Component Diagrams: Shows the
physical implementation of a class and its actions (DLL, programs, interfaces).
Deployment diagrams represent the processor and devices used in implementing a
system.
Eric concluded by explaining some of the extensions to the
UML that are frequently used, and discussed how to bring UML and its concepts
into the “data world”. He cited the universality of the UML in business
modeling, requirements modeling and application development. He encouraged
attendees to learn more about UML and to apply its concepts and techniques in
their data activities.
Questions for Eric were mostly technical and
documentation-oriented, and showed the high level of interest in the UML and its
place in data management.
|
Conference Session |
Speaker |
|
Implementation/Use of Operational Meta Data to Improve Data Quality in the Data Warehouse |
Mike Jennings, Architect
and Manager, Hewitt Associates LLC |
Summary
by Ron Klein
Mike Jennings discussed the Meta Data Repository (MDR) and
the Data Warehouse. He assumes that the MDR should be independent of both ETL
tool selection, and of the Dimensional Modeling technique used.
The purposes of the Repository in the BIE (Business
Intelligence Environment) are:
-
The
repository product and its data model allow the various function areas in the
data warehouse environment to communicate
-
To
provide context to the data content, processes and reports
-
Central
hub of the data warehouse environment
-
Allow
project teams to focus on the operational source system and data warehouse data
models, not the repository
Provide a single location
for integration between the operational source systems,
data warehouse, ETL processes business views, reports, and operational
statistics
Mike presented
a Generic Meta Data Repository Model (see slide #8 in the speaker’s materials
on the conference CD-Rom). He reviewed the various types of business meta data
(e.g. Business terms and definitions for tables and columns, subject area names,
query and report definitions, report mappings) and technical meta data (Physical
table and column names, Data mapping and transformation logic, Source systems,
Foreign keys and indexes, Security, ETL process names).
Operational
meta data is an extension of
the design and architecture of the data warehouse that provides processing
optimizations in data acquisition design, maintenance activities, end user
reconciliation and auditing of information. It
Provides an extra bridge
between the meta data repository and the data warehouse through addition of
physical columns in the design for ease of use, both technical and business. Operational
meta data use will require additional ETL processing steps and time. If a meta
data repository can not be extended for operational meta data or is not
available, lookup tables can be used as an alternative in the warehouse model. Operational
meta data provide a detailed, micro level, explanation of the information
content in the data warehouse. The direct association of meta data to each row
of the information in the data warehouse allowing for detailed (row level)
explanation of information content versus a repository (table/column level) is
the key distinction of this method
Transforming
the Logical Data Model into the Data Warehouse Data Model
There are
eight (8) basic Inmon transformation rules to be applied to the Logical Data
Model in order to convert it into a Data Warehouse Data Model. These
transformation rules should typically be applied in sequence. Mike’s own
“modified” version of these rules is:
1. Removal of
purely operational data
2. Addition of an element of
time to the key structure and operational meta data
3. Addition of
derived data
4.
Transformation of data relationships into artifacts
5.
Accommodations of different levels of granularity
6. Merging
like data from different tables
7. Creation of
arrays of data
8. Separation
of data attributes based on their stability
Operational
Meta Data Examples
There can be
various technical meta data columns (tags) utilized in the data warehouse data
model and ETL processes for enhanced automated support.
-
Load Cycle Identifier
-
Current Flag Indicator
-
Load Date
-
Update Date
-
Operational System(s)
Identifier
-
Active in Operational System
Flag
-
Confidence Level Indicator
-
Cyclic Redundancy Check CRC)
These
columns are added during transformation of the Business Logical model into the
Dimensional or Data Warehouse data model. Use of certain operational meta data
depends on the type of table in question (e.g., Update date on a fact table
would result in little value since these tables are not typically updated in a
standard warehouse). Mike discussed an example of a strategy for operational
meta data use for slowly changing dimensions (SCD). This can be reviewed in his
paper on the conference CD.
|
Conference Session |
Speaker |
|
|
David Hay, President, Essential Strategies |
Summary
by Carey Clark
David Hay creates the most readable data models in the
world (in this author’s humble opinion). In this presentation he presents over
30 logical models and meta models covering all aspects of the information
systems development process itself. Models presented describe the entities and
relationships of the artifacts created during analysis, design, and programming.
He also showed models for data transformations, business rules, screen design,
and object oriented programming. Doing this not only provides a basis for
storing the relevant meta data that would reside in a repository, but also goes
a long way in helping us to understand what we ourselves do.
David avoids the term “meta data” in reference to
repositories. He thinks it’s too restricted. Instead he defers to Michael
Brackett’s designation, the “The Data Resource Repository”.
He reviewed historical efforts to create a repository and
provided his assessment of their success. The OIM and OMG versions he felt were
too abstract. They hold lots of stuff but not the stuff a typical data modeler
would recognize. Oracle Designer is promising. TDAN and Aera Energy were
potentially workable. But he decided to have a go at it himself.
He plugged the TDAN newsletter at www.TDAN.com as required
reading. His own three articles on his Repository Models are there as well. He
started simple and progressed with more, and more elaborate, repository meta
models. All are worth studying and I recommend viewing them.
He contends that UML is only a data modeling notation and
that there is nothing fundamentally different from other notations. It does some
things okay but is not easy to read. He therefore defers to the ER (crows feet)
notation instead. UML also tends to focus the modeler on the application
(physical) rather than on the business (logical).
Dave explained using the models how certain issues were
handled. For example there is the need to have a way to describe elements that
initially may be populated but eventually must be populated. Most
tools make you decide one way or the other up front. He includes derived data in
his model. Whether that data is derived when viewed or stored is an
implementation decision. The logical model is the same.
The problem with most meta models is that they are too
abstract for anyone but data modelers. In order to make models readable to the
user community he added the concept of “virtual entities” that derive from
the abstract one. Thus one can display the entity Customer in a model
view, even though Customer is really the Role of a Party (where
Party is a Person or Organization).
He believes that use cases are awkward because they
assume you understand the process you’re modeling. They are essentially
context level data flow diagrams but lack some of the formality and rigor.
Dave is currently working on business rules meta model
with the Business Rules Group. This group is sort of a replacement for Guide.
Check it out at businessrulesgroup.org.
Not everything about a business belongs in a Repository. He doesn’t claim his models cover every possible modeling subject. For example, work flow models, events, policies might be better stored in their own data store. In none of his models does one see foreign keys. It’s a mechanism for implementing relationships. At the logical level they are implied by the relationship link. Putting them in the model is redundant.
His models are particularly readable and elegant. He uses
Oracle Designer, it allows subtypes to be nested and entities to be stretched so
that that relationship lines rarely overlap and never bend. The bad news is that
it’s expensive.
|
Conference Session |
Speaker |
|
|
Alan Perkins, Vice President, Visible Systems |
Summary
by David Plotkin
This presentation introduced the basics of XML, including
the fact that it is content-based, not presentation-based. It also identified
what tags are used for, and briefly discussed Elements, Attributes, and
Entities, with examples.
The main point of the talk is that XML Without Fear is
based on documenting Enterprise Meta data in the form of business rules. The
types of business rules were listed, including definitions, data integrity
constraints, derivations, inferences, processing sequences, and relationships
among facts. The presentation discussed the advantages of managing business
rules, and the characteristics of a "good" business rule.
The bulk of the presentation discussed modeling of
business rules. In general, constraint-type business rules and derivations
cannot be modeled in a "standard" data modeling tool. However, using
Visible System's tool, Alan demonstrated how data modeling could be extended to
model these types of "impossible to model" business rules.
|
Conference
Session |
Speaker |
|
Data
Management Support for Enterprise Architecture |
Brett Champlin Architecture Consultant, Allstate Insurance Company |
Summary by Linda Kresl
This presentation offered valuable insights on how your
company can manage the data for your enterprise architecture. Brett’s examples
from Allstate Insurance give practical suggestions to handle this difficult
task. The key is to manage the models that support the architecture, but an
Enterprise Architecture is much more than just models. Enterprise architecture
is models, principles, and standards. It includes data and process modeling and
application and technologies architecture.
In this presentation Brett explained architecture
definitions. His first definition was an engineering definition of architecture
– the art and science of building. And the purpose of architecture is to
convey a design. Information systems architecture is the blueprints, drawings
and models, which define and describe what is needed.
Brett presented many schematics and diagrams to show
different architectural frameworks, e.g. Zachman, Gorman’s Knowledge Worker,
Framework for 3-tier C/S development. Brett compared Enterprise architecture to
city planning, comparing the buildings in a city to systems in an enterprise.
The most important element is the infrastructure – what is underneath
supporting the buildings and systems.
Data management support includes defining the processes,
choosing a framework, and integrating the EA with key business processes. Brett
mentioned the several tools to help manage the EA. These tools include:
Corporate Modeler by CASEwise, Metis by NCR, and Architect by ZTI.
|
Conference Session |
Speaker |
|
Business Rule Specification, Validation & Transformation: Advanced Aspects |
Terry Halpin, Technical Lead in Database Design, Microsoft |
Summary
by Margaret O’Hara
Halpin began his presentation by asking the audience how
many used data use cases and object-role modeling (ORM) in their work. About 1/3
of the audience had used them. Halpin’s basic premise in the presentation was
that data use cases and ORM were:
- more understandable because it stated facts and rules in
English and/or intuitive graphics
- more reliable because it validates rules using English and sample populations
- more expressive because it captures more business rules graphically
- more stable because it minimizes the impact of change in models.
Halpin used the example of birth date. Instead of stating
that a person has a birth date, with ORM this becomes, “I was born on ____”
-- a much more natural way for the user to state the date. For the remainder of
the presentation, Halpin presented ORM examples.
In his concluding remarks, Halpin stated that ER was
useful for basic data modeling, but that commercial versions were restricted
with regard to business rules. UML is useful for OO code design but not for
information analysis as its use cases are too process-oriented. For the ER and
UML users, Halpin suggested they use ORM for analysis and then map to ER or UML,
supplement ER and UML with data use cases, or enhance ER and UML to make them
more ORM-like.
|
Conference Session |
Speaker |
|
Business Process Analysis and Logical Process Modeling |
Anne Marie Smith, Assistant Professor, LaSalle
University |
Summary by Anne
Marie Smith
Anne
Marie Smith, assistant professor of MIS at LaSalle University and a data
architect consultant, gave an overview of the concepts of business process
analysis and its relationship to data analysis, with a brief overview of the
methods used to model logical processes and that model’s relationship to a
logical data model.
Anne Marie noted that process analysis should be used in
all systems development, whether transaction processing, decision support/data
warehousing; for both traditional applications as well as electronic commerce
applications. She cited the failure rate of application development projects of
all types and the lack of understanding of the processes that occur, causing
frustration in the user and IT communities.
Business Processes do not operate in a vacuum: they need
data to validate the reason for the processes’ existence. As such, Anne Marie
described the interaction between data analysis and process analysis, and the
need to have BOTH analyses for full application development and user
effectiveness.
Anne Marie’s presentation was enhanced by the use of
actual experiences of her consulting and information management career, and
demonstrated the interaction between data and process in a successful
implementation in different types of development.
With a very brief overview of logical process modeling,
Anne Marie introduced this method to the data analysts in attendance. She
concluded by reiterating the ideas from the introduction and by relating the
needs for understanding processes to data analysts’ understanding of the need
for data analysis.
Some reactions/questions to this presentation showed that
DAMA needs more exposure to processes and processes’ intimate relationship to
data – more process-oriented presentations were requested for future
conferences.
|
Conference Session |
Speaker |
|
Build Your Own Web-Based Meta Data Repository |
Joseph Newcum, Senior
Data Architect, Bank One |
Summary
by Carey Clark
There are several reasons for building your repository
rather than buying one. Vendor versions tend to be costly and can be difficult
to modify. On the build-your-own side of the issue, you must have the skill and
patience in house to attempt the project.
Joseph separates meta data into operational and
developmental. The first deals with the flow of information in the enterprise
such as for loading a data warehouse. These activities happen day in and day
out. Development meta data concerns the creation of applications, the analysis,
models, and constructs used on a project. Your repository will be different
depending on your emphasis.
Bank One spent two years evaluating third party
repositories. Their focus was using the repository to build a data warehouse.
Building their own repository wasn’t straightforward. It took 4 tries. The
first failed because it was too difficult to load data from their case tools.
The second for lack of skilled object oriented programmers. The third was a
purchased repository that didn’t fill the bill. The four try succeeded.
The successful approach to building their repository was
to create a prototype in Microsoft Access, prove the design, and then rebuild it
in HTML and JavaScript for dissemination over the Web. They used Microsoft tools
(Active Server Pages, Active Data Objects, Java) etc. Their modeling tool is
ER/WIN. They don’t have XML incorporated yet.
Joseph walked through and discussed the various display
screens in the Access prototype. The initial application ended up smaller in
many ways because certain meta data simply wasn’t available. The resulting
application primarily supports a data warehouse environment.
They made the interface look like Business Objects. Users
were already familiar with it so the learning curve was reduced. The user
interface is clean and robust. What goes on under the covers is something of a
jumble but is constantly being improved. He believes this is the right approach.
Make the interface elegant and robust and don’t worry so much about internals.
You can change those without the end user being affected. Right now they are
modularizing it into VB classes and moving data into business objects. Subject
matter experts input definitions directly.
He showed the meta models underlying the repository. They
started out as a very abstract thing-thing model used by Knowledgeware’s
Application Development Warehouse. Later it was redone to be less abstract.
An audience member asked if data models themselves are
viewable on-line. The answer was yes but he found that few developers every used
those views: Just not enough space or resolution. Instead most of them plotted
the models out on large plotter paper and pinned them in their cubical.
He recommended the books: Visual Basic 6 Business
Objects and Visual Basic 6 Distributed Objects. These, he said, would
be valuable for their architectural insights even if you didn’t use Visual
Basic.
|
Conference Session |
Speaker |
|
The Role of Data Administration in Managing the Enterprise Portal |
Arvind Shah, President, Performance Development Corporation |
Summary by David
Plotkin
This presentation defined the many kinds of personalized
portals (such as consumer, vertical, B2B, and Corporate) and their purposes. It
discussed the typical problems with B2B portals, and the roles of data
administration in solving these problems.
The roles included some roles that are typically
considered part of data administration, and some (like performance tuning,
security, and supply chain standardization) that are not. The roles typically
considered part of data administration included Planning-Architecture
development, Content Management, and Information Quality Management.
Architecture Development consists of managing Enterprise architecture, establishing a process model, building the data model, setting up the business rules, and creating strategies for information, technology, and BPR initiatives. Content management consists of managing data architecture, enforcing data standards, assuring data timeliness & quality, and assuring security levels. It also means managing meta data.
|
Conference
Session |
Speaker |
|
Developing
a Corporate Data Architecture in a Federated World |
Deborah
Henderson, IT Architect, Hydro
One Networks, Inc. & Vladimir Pantic, IBM |
Summary by Linda Kresl
Deborah presented first and described the business of
Hydro One Networks. Hydro is a wholesale retail electric utility. She gave
several examples of the work that Hydro One is creating in defining their data
architecture. They have a high re-use of data and processes across the
enterprise. She stated that they are leveraging their data warehouse – this is
the driver for the data architecture.
The data architecture is composed of local data, OLAP and
details, external and historical data and the ODS source. Meta data ties
everything together.
The physical database architecture includes an Oracle 8I,
RI, multi-dimensional cubes, and a meta data repository through hooks.
Hydro One is using IBM’s LOVEM methodology to develop
and document processes and implement procedures. This methodology tracks the
life cycle of these deliverables.
At Hydro One business rules are implemented via the ETL.
The ETL then feeds the data marts where additional information is stored to
support the data architecture.
|
Conference Session |
Speaker |
|
Facilitation and the Successful Architect |
Shelly Lieberman, Director,
Strategic Directions, Mathtech |
Summary
by Margaret O’Hara
In this well-organized and entertaining presentation,
Lieberman shared her experiences at the Division of Alcoholic Beverage Control
(ABC) in NJ and the part that facilitation played in achieving a successful
business process reengineering effort. She began by defining facilitation as the
process of harnessing user knowledge and expertise in a group to accomplish
objectives and develop deliverables.
Her presentation included discussion of when and why one
should use facilitation, an overview of the ABC project, the facilitation
approach she used, the results of the facilitation sessions with the ABC and the
critical success factors for the sessions. The facilitation process consists of
careful planning, execution and follow-up, very often with the follow-up
activities feeding directly into the next planning session. A knowledge of the
organizational culture is critical, as not all techniques work in all cultures.
Not all sessions are facilitated; only those involving major issues among the
involved parties.
Once the sessions have been scheduled, it is important to
follow a strict agenda. Each session is split into three parts: an opening
module where the stage is set, the work module , and the closure module where
the wrap-u[p and summary takes place. “boarding” issues – writing them in
a public space in the room for everyone to see often diffuses conflict –
people are assured they are being heard.
Lieberman presented the rules for sessions, including
everyone is equal, critique ideas, not people, etc. and shared the evaluation
forms she uses for the sessions. She also presented the critical success factors
for the sessions. Among these were: commitment from management for change,
knowledgeable participants, open communication, and extensive follow-up.
Lieberman also spent some time dealing with the challenges, such as groups not
wanting to follow structured agendas (stay focused on the issues, but let the
group do their thing), the director having most of the say (talked to director
in background), and “nay Sayers” who didn’t want change (persuaded to join
group by the director).
The session concluded with Lieberman sharing some
resources for further information (iaf-world.org).
|
Conference Session |
Speaker |
|
The Practical Use of a Universal Data Model in the
Data Warehouse, |
David Lepley, Data Analyst, Tyco Electronics |
Summary
by Anne Marie Smith
To demonstrate the need for “context” with data, David
gave an overview of the electronics environment and his company’s history
before launching into a presentation on the Tyco global data warehouse
development and its reliance on universal data models.
David’s presentation gave us:
Business Rules Approach: explained the rationale for
business rules in a Data Warehouse, showed the drivers of the business as
fundamental for understanding the data contained in a data warehouse, and
described why these factors pointed Tyco to using a universal model for its data
warehouse
The Universal Database Concept
and the Universal Database Tables: this is a database design where business
rules about data are stored and used to facilitate development of new and
enhanced applications. David briefly described how Tyco has implemented this
universal database in Oracle, using partitioning and other DBMS facilities.
David’s presentation answered the question “Where do
these concepts fit into the Data Warehouse Architecture?” He explained the
roles of data quality in data warehousing, showed how Tyco is changing culture
to verify and ensure data quality. David referenced Barbara von Halle and David
Hay throughout the presentation, providing reinforcement from experts to his
organization’s approach.
He stressed how this approach was unique to his
organization, and the risk the team took in using a universal data model for the
Tyco Data Warehouse. Thankfully, this approach has been successful to date, and
has been helped by their use of flexible structures, business rules and
committed IS and business team members.
|
Conference Session |
Speaker |
|
Understanding and Managing Reference Data |
Malcolm Chisholm, Manager, Deloitte & Touche |
Summary
by Ron Klein
What is Reference Data?
Reference data is any kind of data that is used solely
to categorize other data found in a database, or solely for relating data in a
database to information beyond the boundaries of the enterprise.
Reference Data…at Best, like Cinderella is forgotten
Reference Data…at Worst` the “Rodney Dangerfield” of
the world of data – “No respect at all”
1 – Rate of Change - Table
structures change rarely, though there can be exceptions, such as in the world
of foreign exchange rates
2
– Volume – Reference data tables typically have few rows and columns, but
there may be many reference tables in a data model
Q: How do you distinguish reference data from domain?
A: Yes, it can be hidden in the domain causing problems for reporting
3 – Scope - One Reference Data table can have
relationships to many other tables in a single database, or across an enterprise
4 – Meta data and Meaning - Individual values of
Reference Data can have meaning, very unlike other data where attribute
definitions suffice
Reference Data Management Issues
-
Implementation
is typically in Program Code, not Database Tables. Using
values taken from Reference Data tables is fine; defining values in program
logic that can be used in updates is not
-
Usage
of External Standards. External standards can be useful, however they
may suffer from “information float” and may not always match the
requirements of the enterprise
-
Divergence
- Different applications have independent functionality for updating their own
Reference Data tables. This leads to divergence in data. The result is MAPPING
whenever data has to be shared between the different databases. Mapping
typically involves semantic analysis, data quality checking, and resolving
granularity problems
-
Accept that
Reference Data is a distinct class of data that is different to other classes of
data
-
Assign an “owner” for reference data. It needs to be centrally
managed. Perhaps the data administration function.
-
Develop a strategy for assigning codes and acronyms as primary keys
-
Controlled redundancy can be a good strategy
-
Publish the content and meaning of reference data for use by developers
and users
Q: Are you sure you can’t
find this reference data. What are the obstacles?
A: No one wants to touch it. Ownership usually goes to the Data Administration
group. On the other hand, business users can sometimes own classification
schemas.
Q: Multiple owners that do
not co-share?
A: 3rd category -> a central repository, non trivial
|
Conference Session |
Speaker |
|
Architecting and Implementing
a Web-Based Corporate Meta Data Repository at the Census Bureau |
Gail Wright, Technical Director, Oracle Corporation
|
Summary by Carey Clark
The
Census Bureau does a lot more than count people every 10 years. It is chartered
to conduct community, demographic, and economic surveys of organizations and
business throughout the country. For example, every business in the country will
receive a questionnaire in 2002.
The questionnaires ask different sets of the same
questions depending on the industry and audience. Creating these questionnaires
on paper took months. Analyzing the results were equally labor intensive. So the
goal was to make a corporate meta data repository that would use meta data to
generate surveys, collect and collate the data, and disseminate the results.
Gail covered their reasons for the repository, what was
included in the repository, how it was architected, designed and implemented.
Lastly she showed how the repository is now poised to be used for nine other
major governmental departments. Because of this effort, work that took months
can now take days. Data is more reliable, and different kinds of studies are
possible. The whole survey process is now meta data driven.
This repository is remarkable in many respects. It’s
large, comprehensive, based on open industry standards, contains tabular and not
tabular data with reference materials and full text search. While most of us
aspire to making a car, they have a space ship.
Their repository includes data content, quality, its
condition, context and meaning. It includes data models, business models, screen
layouts, mappings and transformations, hierarchies, aggregations rules,
formulas, schedules, access controls and actual code. The repository is composed
of the following components:
Nothing is application specific. Industry standards are
followed where they exist. XML is used extensively. No software is created or
modified directly. All of it goes into a modeling tool and is generated from
there. The custom stuff is passed through but is forced to follow the required
standards and process.
Gail described the repository as having a
“tightly-to-loosely coupled architecture”. She described it and the tools
used in detail. It’s scalable, provides for open API’s, is self documenting
and easy to maintain.
She walked us through the interface screens and showed how
the navigation worked and how versatile it was. Security is underneath a set of
“portlets” that determines who gets to see what. The public can see quite a
bit at the web site, American Fact Finder (factfinder.census.gov).
The effort has gone from being a good idea to being
mission critical. The census bureau wouldn’t think of running their business
now without it.
Questions
and Answers
Their repository doesn’t overlap much with the Common
Warehouse Meta Model. CWM is more focused on tool development at the technical
level. Their’s is more focused on the business level.
It took 5 people a year to create the data element
registry. She has 13 people in her group working on various projects.
They decided not to do it in Java. They didn’t have the
skill set. They mostly use Oracle Designer and generate PL SQL.
Michael Gorman, who introduced Gail, emphasized the
importance of pointing out to executives and others how much savings and
benefits a successful project achieved. Memories are short. “Selling after the
sale” enables you to get funding for further projects
|
Conference Session |
Speaker |
|
|
David Plotkin, Senior Data Administrator, Longs Drug Stores |
Summary by David
Plotkin
Then, the complete metamodel for a repository designed to
store DTDs and XML instance documents was presented. The major sections included
DTDs and entities, DTDs and element, elements and attributes, and physical
implementation of elements and attributes.
The presenter also covered the functionality that is needed from a Repository, including scanning in DTDs, making changes, creating revised DTD output, building sample XML documents from DTDs, and doing impact analysis for changes. In addition, he pointed out that although this application is called a "repository", it is a limited-function implementation, and is not that difficult to design and build. However, you still need to use "industrial strength" tools -- no desktop databases need apply!
|
Conference Session |
Speaker |
|
|
Jill Dyche, Partner, Baseline Consulting Group |
Summary
by Anne Marie Smith
Jill
Dyche, a partner at Baseline Consulting Group, presented the major mistakes of
CRM from a data focus. Many sins are data-related, and, can be resolved by
better attention to data management. According to Jill, those sins that are not
data-related can be solved in part by a focus on data (and meta data, in the
author’s opinion). However, data analysis cannot be done “in a vacuum” or
bad actions can result.
She
used references from her recent book, “e-Data: Turning Data into
Information” from Addison-Wesley Publishing, offering “real-life examples”
of each sin and its possible solution. Since “there is no such thing as
plug-and-play in CRM” each example and possible solution must be evaluated in
light of an organization’s goals and objectives.
The
many different definitions of CRM are at the root of many of the problems and
sins in CRM implementation. Data’s reliance on definitions can assist CRM in
developing a solid and reusable definition to use in all CRM projects.
Sins:
No
Unified CRM Strategy (multiple CRM projects occurring simultaneously)
Failing
to Manage Staff Expectations of the benefits and costs of CRM
Failure
to Define Success in Customer Management
Outsourcing
Hastily (or Not at All)
Failure
to Change Business Processes (Failure to differentiate customers and change
processes based on that customer’s value to the organization)
Not
Understanding Product Features and Differences in CRM Approaches
(operational CRM versus analytical CRM)
Lack
of Integration, Understanding and Executive Attention (No “Single Version
of the Truth”)
Closing
with Critical Success Factors, Jill reinforced the ideas she opened the
presentation with, concluding with some examples of successful CRM
implementation. Questions to Jill demonstrated the need for education in CRM,
its concepts, implementation and approaches to solving these “7 Deadly
Sins”.
|
Conference Session |
Speaker |
|
Elevating the Role of IRM for Business Effectiveness |
Larry English, Principal, INFORMATION IMPACT International |
Summary
by Margaret O’Hara
English began his presentation be explaining why
traditional approaches to data administration have failed to create positive
impact and acceptance in the enterprise. The cause, he believes, is that we are
operating still under an industrial age paradigm. We fail to view information as
a strategic enterprise resource because we have overlaid IT on obsolete
structures. The industrial age is vertical; the information age is horizontal.
To illustrate this, one example English used was that all managers (not just HR)
can read organizational charts, all managers (not just financial) can read
balance sheets, but only IT managers can read data models.
To move from data administration to information
stewardship (which English recommends), the organization must view information
as a strategic resource with a resource management life cycle. This means that
information must be planned for, acquired, applied, maintained and disposed of
in the same manner as other resources.
English presented some trends in data / information
quality to illustrate that it is getting worse:
- in one firm, 66% of 6 million
records were useless
- DA influence seems to be decreasing
- DRM is moving away from the business
- 65% of data warehouse initiatives fail outright
English believes that the term meta-data should not be
used because it has no meaning to non-IT people.
To elevate IRM effectiveness:
English believes we must move from Data administration to
Information leadership, and from being data bigot to business bigots. He also
told us: Don’t sell – listen!
|
Conference Session |
Speaker |
|
Comparison of Data Modeling Techniques |
Panel: Davida Berger (moderator) Graham
Witt |
Summary
by Davida Berger
This was a very lively advanced session with renowned
modeling experts
ERM
Provides
for the complete definition of information requirements in an understandable
format such as entities, attributes, relationships,
generalizations/subtypes.
Well-defined
integration with process models. CRUD matrix relates entities to processes
in the DFD (data flow diagram).
Entities
and attributes can be easily visualized as tables and columns and
implemented in relational or object relational database management system
ORM
Best
use is for conceptual informational analysis
Focus
on fact types where objects play roles. Fact instances, types and rules are
verbalized in a formal, graphical and textual language
Mature
and well defined
Limited
modeling tool support
May
be better than ERM for conveying requirements to designers but not good for
dialog with the business
UML
Can
capture additional elements such as triggers and indexes
Data
and process not as well integrated as in ERM
Has
limitations for database modeling. No key constraints but very useful, and
may be better than ERM for object oriented code design
No matter what methodology is used the model must be
designed and readable for the business community. Special attention should be
given to the presentation and arrangement of the diagram. Names of entities,
attributes, and relationship should not be cryptic and should represent business
terms and not computer or system concepts or functions.
|
Conference Session |
Speaker |
|
Meta Data – Myth and Realities |
John Ladley, President
Knowledge
InterSpace, Inc. |
Summary
by Ron Klein
John outlined his experience – he did “James Martin
stuff”. He worked for Meta Group. He worked at integrating everything and
doing Data Administration.
John makes the point that business is “gray” – not
black & white. Collaborative
Intelligence comes about when tacit and unstructured information is factored
into a business decision.
The reality of meta data is that there
are No comprehensive tools, Repositories are not capable enough, there are 2-3
standards, and too much in house development. However, CWM is a
tremendous step in standards. Remember that CWM
scope is limited to data warehouse (DW) - and analytic application-relevant
metadata, while the OIM schema is supposedly capable of handling knowledge
management and business-process constructs. Therefore, enterprises considering
panoramic metadata/repository initiatives may find CWM limiting, though more
broadly supported.
Don’t be afraid to build your
meta data bottom up.
Despite his apparent despair at
the state of meta data products and management, John actually believes the
importance of meta data will increase in the future. His summary slide said:
|
Conference Session |
Speaker |
|
The UPS Meta Data Repository – A Success Story |
Patti Munier, Senior Data Analyst and Manager,
United Parcel Service |
Summary
by Carey Clark
UPS is a large company. Every year it handles 3.28 billion
parcels using 1700 facilities, 575 aircraft 149,000 vehicles, and 344,000
employees. It is 93 years old.
UPS uses Computer Associates’ Platinum Repository and
rather than being a gate keeper for new development they are more of a watch
dog. They use Platinum’s scanners to scan all production databases and
programs throughout the enterprise. They then compare what they find to the meta
data in the repository. Entries that aren’t recognized or don’t meet
standards are flagged for review and brought into compliance. What passes is
parsed and loaded.
Developers use the repository and are required to involve
data administration from the outset of a project. But because Patti’s group is
constantly scanning the end result, they know what is real.
They track over 5000 key words, 30,000 data elements;
database structures, and copybooks. The repository is updated twice a month.
This data is then distributed through an intranet. The site gets 24,000 hits a
day by every level of user.
One of the key processes is what they call
rationalization. All representations of data are documented and linked back to
the master name and definition. The data description is stored only once. This
enables UPS to do impact analyses quickly. Anyone can find out what data is
being used, where it is being used, and whether or not it’s official. The
benefit of this cannot be over estimated.
Meta data types include, abbreviation name, full English
name, physical name, standing (approved, non approved, skeleton), source (e.g.
vendor name), descriptions, warehouse description and history. Every data
element ends in a “class word” (e.g., number, text, code, etc.) as part of
its formal name.
The success of this effort has reduced data disparity and
allowed them to decommission the other dictionaries at hubs and distribution
centers. The repository is used for training new employees who are able to learn
the corporate vocabulary quickly.
In the future Patti’s group plans to compete the data
element quality application, provide support for XML, DTD’s, and Schemas,
automate scanning and loading of SQL Server data, and add business rules.
Patti presented some of the repository’s screens:
Straightforward, understandable and powerful.
|
Conference Session |
Speaker |
|
Universal Data Models for Web Constructs |
Len Silverston, Founder, Universal Data Models |
Summary by David
Plotkin
The motto of the presentation was: "The more you see
the whole, the closer you move towards the truth".
Len presented a series of generalized (or
"universal") models for the following subjects: Web Parties, Web Party
Contact Mechanism, Web Login, Web Site Content, Web Object Usage, Web Visits and
Hits, and Web Star Schema (data warehouse). The common characteristic of these
model is that they did not contain any aspects of the business at the entity
level. Instead, they used very generic terms such as "Party" (person,
organization, or automated agent who participates in a process or transaction),
Party Type (a generalized way of classifying parties) and party role (customer,
referrer, supplier, etc.). Although Len did not model the relationships
themselves in the limited time available, he did state that the roles could not
exist without a relationship. For example, the role "customer" could
not exist without a relationship between parties.
Tuesday,
March 6th, 2001
|
KEYNOTE |
Speaker |
|
(and
DAMA Individual Achievement Award) |
Peter Chen, Professor, Louisiana State University |
Summary by Anne Marie Smith
Rose Romero, DAMA International VP of Communication,
presented the 2001 DAMA International Individual Achievement Award to Dr. Peter
Aiken, and Dr. E.F. Codd. This is the first time that 2 individuals were the
recipients of the Individual Achievement Award. Drs. Aiken and Codd received
this award for their significant contributions in the field of Information
Resource Management. As educators, consultants and authors, they have assisted
numerous companies in developing and maintaining data resource management
environments, therefore expanding and enhancing the roles of information
management professionals. It should be noted that Dr. Aiken is a member of the
DAMA International Board of Advisors.
Other nominees for the 2001 Individual Achievement Award
were:
Larry P. English, David Marco, Dr. James Martin, Dr.
Richard Nolan
After the award ceremony, Dr. Peter Chen, the originator
of the ER model, delivered a keynote address on the relationships among the ER
model, XML and the World Wide Web. Dr. Chen was the 2000 DAMA International
Individual Achievement Award. He gave the attendees an understanding of XML and
ER modeling, as well as several good, new buzzwords.
His entertaining and very informative presentation focused
on:
Dr. Chen concluded with his insights on other interesting
research directions in XML and web modeling. He stressed the need for
methodology for modeling in all arenas, and urged the attendees to actively
participate in the expansion and development of understanding of XML and ER
modeling.
|
Conference Session |
Speaker |
|
Business Information Management at Johnson and Johnson: Beginning the Process |
Larry Dziedzic, Information
Management Architect, Johnson &
Johnson |
Summary
by Margaret O’Hara
Larry Dziedzic began his presentation by offering a brief
history of Johnson and Johnson and his personal background in the Information
Management discipline. With 198 diverse companies scattered throughout 52
countries, coming to agreement on an any enterprise wide standards is a daunting
task. The companies are grouped together into three primary divisions: Consumer
products (shampoo, band-aids, Tylenol), medical devised and diagnostics (hips,
shoulders, glucose monitors) and pharmaceuticals.
He then presented the initial plan for establishing the
business Information Management (BIM) program at Johnson and Johnson. Using some
basic and easy-to-understand examples, he explained the particular problems
J&J experiences. For example, when a new fragrance is added to a shampoo,
does it become a new product or a variation on the existing product? Because of
the nature of the J&J culture (with all companies retaining some degree of
autonomy), questions such as this have myriad answers.
Other surprising issues he encountered included: only 70%
of information being correct, and management being satisfied with that
statistic. Moreover, the Information Management Architecture group did not
typically talk to the customers, relying instead on pre-existing information –
which was sometimes inaccurate. Thus, the lack of attention paid by IM to the
business side, and therefore a lack of appropriate information were fundamental
problems.
Dziedzic went on to illustrate some classic examples of
“dab” information making the news to the detriment of the organization to
which the information applied. Among the specific challenges that J&J faces
are: the level of autonomy of the 198 diverse companies, the varying level of
resources for these firms, and the lack of standard ERP package among the three
primary groups (One has selected JD Edwards and two have selected SAP).
To alleviate the situation, global competency centers
(GCCs) are being formed to liaison to the business community. Thus far, GCCs
have been established for two of the groups, with the third one coming later
this year. These GCCs will work with the global partners to establish unified
applications and implement global strategies. Consultants (internal and
external) and helping to develop the BIM strategies and best practices and tools
will be utilized.
One major problem J&J faces is that the SAP and JD
Edwards packages will eventually have to interface. More importantly, the task
of implementing the GCCs is very much a people problem – with listening,
educating and communicating being top priorities.
|
Conference
Session |
Speaker |
|
Measuring the Quality of Models |
Peter A. McDougall, Senior Data Administrator, Insurance Corporation of British Columbia |
Summary by Linda Kresl
This presentation focused on an approach for measuring
model quality that Peter developed over five years ago. The criteria for
evaluating a model are based upon the aspects of communication. Furthermore,
since a data model is a composite object, the presentation described how a
model’s quality is actually derived from the collective quality of its
components. Thus any quality measures shouldn’t be applied to the model as a
whole, but instead to its smaller, atomic-level pieces. As such, five
communications based yardsticks – Accuracy, Clarity, Consistency, Conciseness
and Completeness were discussed.
Peter also focused on the model review process. Two
techniques called Direct Feedback and Business-Based questioning, plus how the
quality measures are used with these methods, will be described. These
techniques focus on understanding the business unit’s relationship to the
message from the model. They take a nonjudgmental perspective and are designed
to develop a collaborative framework used for working towards a quality product.
Lastly, the presentation described how communications-based criteria ultimately
produce better models.
The following topics were discussed by Peter:
·
Why communications-based measures are useful to evaluate the
quality of a model
·
The five criteria used to measure quality
·
A set of techniques for applying the measures
·
Why the approach creates models that have quality built-in,
instead of “inspected in”
|
Conference Session |
Speaker |
|
Organizational and Development Strategies for Creating a High-ROI Enterprise Data Warehouse |
Brent
Lautenschlegar, Principal, Reflection
Technology Corporation |
Summary by Anne Marie Smith
Brent
has much experience in enterprise applications and data warehousing. He used
these experiences to describe the implementation of an enterprise data warehouse
at Delta Air Lines.
Brent
gave an overview of the history of the data warehouse at Delta, which had a
focus of incremental growth. Business users at Delta were not well served by
Information Technology at Delta, and this lack formed the rationale for
developing and implementing an enterprise data warehouse. As a result, Brent’s
presentation was more business-oriented than technical, although he did discuss
some very technical topics in answering questions. The teams of users and IT
specialists included subject areas of HR, Operations, Finance and
Marketing/Sales. Eventually, this data warehouse was able to “establish a
single version of the truth”. Having a conceptual data model for the
enterprise was essential to the success of planning this massive project,
despite the fact that many subject areas did not have transactional level data
models to use as a basis for the data warehouse. Capturing requirements and
feedback from the user community was a hallmark of the quality effort within
Delta and the data warehouse project.
Brent
outlined the technologies used in this project: Teradata for the DW database;
Brio for querying and reporting, SAS for statistical analysis; Informatica for
extraction, transformation and loading (ETL) and Essbase for multi-dimensional
database management.
Each
module of the data warehouse was developed within a 60-day period, to counter
the perception of a data warehouse as a monolithic project. Incremental
development has many benefits to both IS and users, and gives ownership and
control to the development and implementation teams, as well as demonstrating
the progress of data warehousing to the organization’s management. One
disadvantage to this rapid, incremental development effort was the need to alter
the habits and expectations of database administrators and data administrators /
modelers. These team members were not accustomed to working in this rapid
environment, and some culture change was necessary. Brent explained the steps
the teams used to meet this development deadline, and described some of the
challenges the teams encountered in some subject areas.
Questions
to Brent were both business-oriented (cost-benefits, information use approach,
skill development) and technical (reasons for choosing certain technology,
interfaces and their construction). Questions lasted into the break period.
Summary
by Ron Klein
The Library of Alexandria purpose was to gather material
from the countries they conquered to subjugate them. A heck of a business value!
Start with Robert Anthony’s Framework for looking at
enterprises (see page 11 of speaker’s paper on CD-Rom). Consider that
knowledge can viewed in a similar manner (see pg 12). Now propose an architected
view knowledge – a Library model is not a good model for the business.
Gil and Frank stressed the following key presentation
points:
-
The meta views and knowledge content are important to an enterprise
- Meta views are
needed to successfully implement critical applications in a business such as:
– Enterprise application
integration
– Business performance
measurement
– Customer relationship
management
– Enterprise resource
planning
- Knowledge fills
or is connected to many meta structures
- A meta-data
strategy is needed to get best business value
- Businesses
without meta views will gradually fall behind with failed implementations or
only partial realization of benefits
Integration
will get money because it saves money!
|
Conference Session |
Speaker |
|
|
Andrew
Watson, Technical Director, Object Management Group |
Summary
by Carey Clark
Andrew described the Object Management Group. It’s a
not-for-profit. body with over 800 members where decisions are proposed and
accepted by their members. OMG is not an official standards body like ISO and no
one is obligated to conform, however most do. Anyone can access and download
their specifications. There are no fees or passwords.
OMG has numerous task forces and special interest groups
covering all manner of subjects and industries.
UML
OO modeling like ER modeling has a wide variety of
notations. By 1994 it was a real mess. Similar concepts, incompatible notations,
few support tools. Methodologist are often very stubborn, and getting agreement
is extremely difficult. In ’95 Jacobson and Soley began to push for modeling
standards. By 1997 UML was accepted by all parties. The current version is 1.4.
UML is designed for visualizing and documenting software.
It is was not designed for database modeling. UML is not a method but a
convention for representing software constructs. Because of this standard, lots
of tools have been built and over 60 books written. It is now used in over 70%
of IT shops. Until it was adopted no one was willing to invest the capital to
develop tools.
Version 2.0 of the specification is under development and
if you want to influence it, now is the time to speak up. Thirty seven companies
are already on board.
MOF
The meta object facility is a meta data architecture (i.e.
for repositories). It works in cooperation with UML. It leans heavily on XMI, a
meta data exchange specification. XMI enables meta data to be passed between
modeling tools. This in turn enables DTD’s and later XML Schemas go in and out
of modeling tools seamlessly.
CWM
The volume of data in an organization doubles every 5
years. Much of it is redundant and inconsistent. CWM provides a standard way of
handling data warehouse problems. It supports ETL, OLAP, XMI, and UML. In
addition specifications are being developed by, and for, specific industries.
CORBA
CORBA is a middleware specification. It’s a list of
API’s that allow data to be moved from legacy systems to new ones and back.
There is still a lot of COBOL code that needs to integrate with VB, Java,
DBMS’s, the Web, etc. It facilitates this integration while staying vendor
independent.
CORBA has been extended to include XML and DOM (Document
Object Model). It enables XML structures to be compacted into a binary format
for easy transport.
Domain
Specific Standards
PIDS, or Personal Identification Services, provides a way
for health care providers to identify individuals. There is no reliable unique
identifier for people and misidentification can mean wrong treatment. Hence
algorithms determine the probability of a match.
Resource Access Decision (RAD) specifies how to secure
access to healthcare data. It helps to implement and enforce access policies and
procedures.
|
Conference Session |
Speaker |
|
Embracing XML Strategic Implications for Data Administrators/Architects |
Peter Aiken, Institute for Data Research Virginia Commonwealth University |
Summary
by Arnie Hook
Dr. Aiken looks at the organization/legacy assets to
locate opportunities to integrate data with the management of meta data. The
focus is on the evolution of systems. He advises to not try and develop
components all at once. Time and expense equation?
The presentation identifies XML Benefit and XML
application Integration ratings for various business and technology classes. XML
is ‘meta data wrapped around data’ and associated with business problems and
planning.
XML equips the organizations with the tools to and
technology develop programmatic solution to manage data interchange environments
using economies of scale. Peter explains the metrics and time problems for
engineering the legacy. The 7-hour per attribute definition metric does not
exist (a myth) in creating project plans.
Aiken uses real life examples for the audience to
understand the implications of XML, data architecture/engineering, and data
management practices to approach and define data solutions. The scenario of
systems operations using XML manages business rules and data interchanges.
Using XML expands the definition, roles, and preparation
required of data management for e-business development. Attendees of this
session benefit from early XML adopters and the role XML will play in future
data management.
|
Conference
Session |
Speaker |
|
Enterprise Data Management Without the Enterprise
Data Model: Working in the
Real World |
Sheri Dumire-Hamilton, Senior Systems/Business Analyst, Kodak |
Summary
by Margaret O’Hara
The goals of Sheri’s presentation were to demonstrate
how ED Management would benefit the firm, to present some different approaches
to resolving issues and to identify some sources and issues of technology
change. The goals of ED Management are to increase data sharing across the
organization, to increase reuse of data and maintain control, to enable
evolution of new technology, and to integrate new needs and stability of DBs
over time – in effect as data evolves, the DM must keep up.
An ED Model does several things. It supports the use of
data as a corporate asset; it provides a vehicle for communication and agreeing
about data meaning and usage, and it supports the sharing and reuse of data
across functional areas. Still ED Models are often not constructed. These are
many reasons for this. Among the reasons are: construction requires support and
direction from senior management, it absorbs resources and may not provide
immediate measurable value, and it is often perceived as a corporate mandate
with little value to specific functional areas.
So, where can you start to develop ED Management? First,
select a problem that data management will address with high probability of
success. Symptoms of DM problems include: lots of interfaces being written,
customer complaints about supplying information repeatedly and errors due to bad
data, data unavailable for decision making, problems in enterprise data
management, and difficulty in meeting changing business needs.
To handle the problems, first define the problem domain
then plan the approach to solve it. It is important to publish the approach and
review it with affected areas. Some things that can “bite you” are: there is
a common ground, but everyone is fighting for a piece of it. To alleviate this,
find a champion and form a steering committee. Power struggles occur because
data is not seen as a corporate asset. By educating the concerned parties about
the nature of data management data is viewed more as a corporate asset. Finally,
it is important to network, communicate and educate the involved parties. Build
relationships with individuals to increase their comfort level, their trust and
your own credibility.
|
Conference
Session |
Speaker |
|
How
do you Convince Management to fund your Proposal? |
David Davis, Vice-President, Enterprise Data Management Group, Bank One |
Summary
by Linda Kresl
This presentation focused on the political maneuvering
required to persuade and convince management to fund projects. David explained
that people with technical backgrounds often stress the technical aspects of a
proposal to their detriment. The context of the proposal, it’s timing and how
it is presented often affect the acceptance or disapproval of a good proposal.
Various anecdotes, analogies, marketing and forming alliances can lead to
successful, approved proposals and projects. The best implementation, technique,
new technology and method do not guarantee acceptance and funding.
This presentation further explained the following steps to
ensure success:
·
Learn that the work involved in “selling” a proposal may be as
difficult and necessary as the project
·
A technique of creating analogies
·
Share successes and failures
·
Learn the importance of ‘sound bites’, charts and diagrams to
sell proposals
|
Conference Session |
Speaker |
|
Data Warehouse Project Planning |
Sid
Adelman, Founder, Sid Adelman & Associates |
Summary
by Anne Marie Smith
Sid
Adelman, consultant and co-author of the book “Data Warehouse Project
Management” presented a roadmap for developing a successful data warehouse
project plan.
Sid
outlined the history of data warehouse project planning, why project planning is
critical to the success of any development effort, what constitutes a proper
data warehouse project plan and how to relate the project plan to the technical
infrastructure.
To
date, many organizations have taken the approach of not planning a data
warehouse project for many reasons. Almost without exception, these non-planned
projects have failed, and according to Sid’s research, this failure can be
traced to the lack of a concrete project plan. This presentation showed the
similarities between traditional systems development and data warehouse
development and the few differences.
Major
points in Sid’s presentation included:
·
Project
Selection: choose sponsors and users who really want the project to succeed, a
project with importance to the organization, a project that WILL succeed (not
necessarily a high profile or controversial project), and a project with
measurable success factors, a project with reasonable size (database and
interfaces) and reasonable time expectations, project control
·
Function:
source data (from where are you getting the data, is it reliable and clean?);
determine needed summaries, aggregation and integration methods; develop
appropriate canned queries, issues in the meta data repository for a DW
(user-oriented). User and technical functionality are different, and the
differences must be understood and evaluated.
·
User
Expectations: performance (sub-second response time is unrealistic from a DW),
simplicity (ease of use of the user tools, easy to understand navigation),
accuracy (clean data, correct data – these are different), availability (do
you really need 24x7, 365? This is very expensive and usually not a true
requirement), timeliness (data refresh expectations must be established),
difference between summary and detail data access needs. Traditionally, success
is not well-defined, and can be achieved through communication of expectations
·
Scheduling:
taking a phased approach (by subject area and user role delineation) is the
foundation of a successful data warehouse, task estimation (a difficult task and
experience contributes to amount of time needed to complete a task), actual
hours worked versus elapsed time (which measurement will you use? – use both),
essential to build contingency factors into a plan since interruptions will
always occur, schedule responsibly since too-tight schedules force people to do
re-work. Delivering low-quality results quickly is NOT a method for success! Sid
felt that a 60-day phase was a bit too short, and recommended a 3-month phase.
·
User
Responsibilities: co-project management (IT and user managers), users must
define requirements (NOT the IT staff), security requirements (essential in web
access to data), determining roles in query and reporting tool selection (not
necessary to involve users in infrastructure tool selections, user involvement
in training material development and implementation
·
Tools
and Service Agreements: performance and response time requirements are not
appropriate for a DW, but availability and problem response time requirements
are appropriate for a DW, DW implications on the work of the Help Desk or other
support mechanisms
·
DW
Project Planning: the usual steps of application development project planning
apply, each task should not exceed a 40-hour period, each task should have a
primary responsible party (even if there are more than one person on the task),
each task should have a defined deliverable, each deliverable should be
evaluated for completeness and contribute to a defined milestone, progress
monitoring and change control management are also important and frequently
forgotten
·
Resources:
people versus roles (some people can fill multiple roles, but should they?),
development and maintenance of a capabilities and skills assessment for all team
members, direct reporting relationship (100% focus on the DW project),
importance of management commitment and active support across and through the
organization
Sid
concluded with offers of some reference material (web links, task examples,
suggested vendors) to interested attendees.
There
were numerous questions, and they included the issue of data cleansing at the
source (do you go back and clean up data that is clean in the DW and not clean
in the source?), the best format of a project plan for a DW (iterative or
spiral), cost-benefit analysis of a DW (see an accountant!), choices in various
tool categories, and specific roles to be included in any data warehouse
project. These questions showed the level of interest in data warehousing and
its “resurgence”. It also demonstrated the need for more presentations on
data warehousing and project management.
|
Conference Session |
Speaker |
|
Meta data Directory vs. Meta Repository |
James Jones, Product Manager, Oracle Corporation |
Summary
by Ron Klein
James started by citing ORACLE own experience in
streamlining its business using its own solutions. i.e. “Eating our own dog
food”
Lightweight Directory Access Protocol (LDAP) is the
Exploding Standard. It is a light, browser friendly client implementation.
What are Directory Services?
-
“A flexible, special-purpose distributed database designed to the
storage and retrieval of entry-oriented information for a wide range of
applications.”
-
DS are a type of universe of meta data
The Meta Directory Paradigm:
-
Touches
everything and is everywhere
- A single directory that connects everything
Stretching
the idea of meta data persistency and sharing:
Nodes
+ Hubes = Nubes <- ETL
|
Meta Directory |
Meta Data Repository |
|
Metadata (Hierarchical) –Security –Party –Network –Device Security Integration Device Integration Giant Installed Base |
Metadata (Any) Managing files and
folders Dependency management Versioning Configuration
management Tool Integration Small Installed Base |
Q: Is the Meta Directory usually a source for the
repository?
A: A place where it can store this information, but it is not strong enough to
hold the complexity.
Q: Should we be hanging off these directories underneath a
repository?
A: Underneath a portal, yes.
|
Conference Session |
Speaker |
|
Ramping
up for Meta Data and Knowledge Management |
Don
Soulsby, Director
of Architecture Strategies,
Computer Associates |
Summary
by Carey Clark
In the beginning
was Electronic Data Processing (EDP). The focus was handling files and getting
data in. In the 80’s the focus was on getting the data out (DSS, EIS,
Queries). This age will be known as the knowledge management era. Tabular data
needs to merge with documents, graphics, and video. The buzzwords are integration
and access, and like before, tools follow the need.
Knowledge Management
Knowledge is
information (data) at work. Eighty to ninety percent of corporate information is
not tabular in nature. The issue is how to store and retrieve it efficiently and
combine it with pertinent tabular data. The difficulty is compounded by the
tribal nature of various disciplines: Data Processing, Library Science,
multimedia, desktop applications, Web technologies etc.
Legacy systems
tend to resemble spaghetti. When using third party packages one must not only
use other peoples’ products, but other peoples’ models. How, then do you
find what you are looking for? In 15 years the baby boomers begin to retire and
their knowledge goes with them. It behooves organizations to capture as much as
possible before they go.
It’s a massive
problem, not unlike building the Empire State Building or the Queen Mary. As in
those cases, a key factor was having the right tools (e.g. the rivet gun).
The Solution: An Enterprise
Information Portal
Create a single
place where all information can be accessed and displayed. Integrate the various
forms of information. Where possible, provide dynamic personalization. Make it
easy to find, easy to understand, easy to navigate, and believable. Provide
information in context as a way to recognize what you have.
Most knowledge
architectures are hierarchical. This is efficient for getting somewhere fast but
not for finding stuff in the first place. A better model is the Knowledge Mall.
You can find stuff alphabetically, by category, by context, and by wandering
around. Still there’s a need for a map.
Don’s technique
was to classify information using the Zachman Framework. Going vertically you
have rows for Business, Operational, and Technical. Going horizontal are columns
for Who, What, and Where. This could be expanded to match Zachman’s 6x6
matrix.
Personalization
involves knowing specifics about the user. Might be buying patterns, sales
patterns, demographics. Based on these the user sees different screens, menus,
options etc. Software behind the scenes is able to learn, predict, adapt, and
optimize. Patterns are recognized and used extensively to present information or
suggest new resources.
Observations/Predictions
Metadata
repositories are likely to adopt parallels to retail’s UPI codes. Data will
have truly unique identifiers.
Meta data must be
collected in response to a business event. If people have to enter it manually,
it most likely will not be maintained. The imperative is to decrease the number
of duplicate instances. Store once, distribute many.
He expects
knowledge navigation and supporting software to resemble the neural net: It
recognizes patterns, learns from experience, adapts dynamically, and predicts
outcomes.
|
Conference Session |
Speaker |
|
Building the Scalable Data E-Frastructure |
Tim McBreen, Senior Principal and E-business
Practice Leader, Knightsbridge Solutions |
Summary
by Arnie Hook
The theme of the talks is to make sure we are ‘building
the enterprise infrastructure’. Tim describes the high-performance data
solution, which is robust, and scalable and cost effective. Why performance
matters related to data volumes and quality of use, and influx of data.
Mr. McBreen says that scalability rules the day; build it
once; build it right, scale often. The e-frastructure data engine includes:
Data acquisition processes
Data repository
Data mart creation processes
Tim describes a typical solution encompassing data
extraction, transformation, aggregation, and balancing/controls & loading.
The tool of the month club will not work to manage the e-frastructure
environment. Changing tools created chaos for impact analysis and applying new
requirements.
Business path – end user focus
Data path – design, development focus
Infrastructure path – design, configuration, implementation focus
Mr. McBreen stresses the importance of data management
solutions that allow companies to enable the ‘power enterprise’. A
compelling message to the data practitioner needing an approach to deliver a
data warehouse application.
|
Conference
Session |
Speaker |
|
Data Administration on A Shoestring
|
Becky
Kirkpatrick, Data Architect
Union Pacific Technologies |
Summary by Margaret O’Hara
Becky began describing how Union Pacific IM has adjusted
to downsizing of staff, mergers and lack of funding to provide an online
metadata repository that was quickly put together, is very functional and
continues to grow in use and in capability.
The results of her and 1.5 full time employees is an
enabled website using the Zachman
The problem, as Kirkpatrick detailed, is that end users
and IT project managers want to know immediately where they can get state and
country data, customer number information and values from a railroad equipment
master.
None of these important questions could easily be answered
by any means that were currently available.
Kirkpatrick’s management that her group of 2.5 people put something
together within a 3 to 4 month period.
The team ‘piggy backed’ on existing files (manual and
automated) and utilized existing technologies (i.e. Access, Excel) coupled those
with web development and produced a product that was successfully implemented
and accepted.
Kirkpatrick then walked the audience through a
demonstration of the online site that they developed.
|
Conference
Session |
Speaker |
|
Mapping
UML to the Zachman Framework |
Neal Fishman, Enterprise Architect , Equifax |
Summary by Linda Kresl
This presentation focused on why it is important to map
the UML to the Zachman framework. The number one reason is to model systems,
from concept to executable artifact, using object-oriented techniques.
• To address the issue of
scale inherent in complex, mission- critical systems.
• To create a modeling
language usable by both humans and machines.
• Use the UML for...
– Visualizing
– Specifying
– Constructing
– Documenting
Neal explained that the UML consists of nine models and
the Object Constraint Language (OCL). The Zachman Framework for Enterprise
Architecture identifies at least thirty models. This presentation reviewed each
UML model type (use case, class, object, component, deployment, activity,
statechart, collaboration, sequence, an OCL), and review which of the Zachman
cells they map to. The presentation then explored the use of stereotypes to
augment the native UML models in creating more model types to demonstrate how to
complete the mapping to the framework.
·
Identifying the UML models
·
The Zachman Cells
·
Using stereotypes
·
Mapping the models
|
Conference Session |
Speaker |
|
Managing Customer Information for CRM |
Danette McGilvray, Customer
Information Quality Program Manager, Agilent
Technologies |
Summary
by Anne Marie Smith
Danette
asked and answered the questions “Can you claim to know your customer if the
information in your systems about that customer is wrong?” and “How can you
manage the relationship with your customer if the basic process for acquiring,
maintaining and using that information are not working?”
Danette’s
presentation focused on these points:
Danette
presented a case study in CRM, using Agilent’s customers as the basis of a CRM
initiative. The case examined a customer information system (one of many at this
client) to determine the level of effectiveness for CRM. The system was
developed with the framework mentioned above, and was used as a method to
re-engineer the customer approach at this client. She concluded with some
examples of uses of information in a CRM pilot system and a list of challenges
to CRM.
Questions,
taken throughout the presentation, were around the framework’s development,
uses of information in CRM, explorations of reasons for CRM failure. Danette’s
presentation showed the relationship data has in a CRM effort, and the need for
quality data in CRM.
|
Conference Session |
Speaker |
|
|
Bob Carasik, Systems Architect, Wells Fargo Bank |
Summary
by Ron Klein
Bob has worked
with Data Dictionary for two decades and is still doing the same. He worked with
Case, Repository, XML and messaging standards. He currently co-ordinates the
meta data initiative for the enterprise portal. Wells Fargo is a leader in
Internet banking, eBay and account aggregation to customer. You will hear more
and more that all your financial services can be bundled in one site. Clients do
one log-in and have access to many financial services.
Doing
the Portal = Reality hits people in the face. We
need to know about meta data. It is quick to explain why it is important, but to
get into the project plan is another story.
Mapping is a hot spot to help
find inconsistencies.
The goal is to
make the transitions easier. Has to be bottom up and has to be distributed.
You find meta data
automatically through the Web.
Messages are
way under cover in systems. Now it surfaces as meta data and turns out to be as
important as database schema meta data.
End Users
don’t need to understand the meta data, but do quick searches.
Meta data
Paradigms: The Ideal
The old idea
to centralize everything did not work. The enterprise wide model is also a
challenge. I can see the advantage of that, e.g. DHL expanded one character on
the packaged ID field, and has been dealing with this issue for many years.
Bob strongly
suggests the federated approach to meta data. This accepts that semantics differ
across the enterprise but provides a common format for meta data.
He also proposes a lightweight meta data strategy for
building step by step. Recognizes that high
quality meta data frequently costs too much to provide, relative to its benefits
to users. You don’t need a full
repository to begin with.
Resources can
come when you show how much conversation will be needed.
Lower your
standards! You’ll feel good when you deliver.
Q: When you
gather meta data are you building processes to maintain it fresh?
A: Share tags and retrieve them on a project-by-project basis – it is a
negotiation process. It is not necessarily repeatable. That is the just in time
concept here.
Q: What tools?
A: ORACLE, sometimes a Web resource. Repository technology meets some needs but
not all. Have a standard for XML development.
Q: How you
handle change management?
A: Specific for each project.
Bob
did a very good piece-by-piece presentation on a current issue that most of us
are dealing with, namely developing portals and how to piggyback to develop and
gather meta data. Relevant notes on the project: we have 2
levels taxonomy. Allies: our technical library, our internal web team, PMO. Lots
of goodwill for meta data creation. XML-Schema as a documentation tool: Document
Language to a Data Language. Modified Dublin Core for defining Meta Tags.
|
Conference Session |
Speaker |
|
Architectures
for Marrying Online Applications with
Information Repositories |
Faisal
Shah, Chief
Technology Officer, Knightbridge
Solutions |
Summary
by Carey Clark
Knightsbridge
Solutions works with Fortune 500 companies to marry data from transaction
processing systems with data from data warehouses. Conceptually this is trivial.
One might suppose you just create a front end to display data from both
environments.
In practice,
however, doing this is very difficult. The difficulty arises from the
fundamentally different “quality of service” requirements of each
environment. Explaining and resolving this difficulty is the subject of the
presentation.
So
why marry these two data sources in the first place?
A bank wants to
provide customers with analytical information about their investments, i.e.,
reporting tools to compare their portfolio with industry indices (e.g. Dow
Jones, Standard and Poors). Customers want compare their performance with
newsletter or broker recommendations. This is a serious competitive advantage if
the bank’s competitors don’t offer it.
An Internet
hosting service wants to provide their advertisers with real time statistics:
The number of visits to a site, the kind of visitors they were, what web pages
were visited etc. This must be done on an hourly basis; two days late is
unacceptable. In both cases historical computed data is displayed along with
real time transaction data.
Quality Service Levels
A typical online
transaction processing system (OLTP) requires 24x7 uptime, sub second response
times, and 100% accuracy. It must fault tolerant even against disasters. It has
very narrow maintenance windows, usually minutes.
A data warehouse
systems are up typically 12 hours a day, 6 day a week (12x6). The off hours are
needed for batch processing. They don’t need transaction monitors, the data
can be a day or two old, response time can be several minutes, and if something
goes wrong, you’re not out of business.
And herein lies
the problem: Transaction systems can’t tolerate warehouse service levels and
warehouse systems can’t realistically achieve transaction service levels.
Users who see both data types at the same time assume the same service level.
What to do
It is real
important to perform careful ROI analysis. Ambitious requirements can be
outrageously expensive, even for large companies. The best solution is a set of
trade offs.
The first reality
is that one cannot put both transaction and warehouse data on the same box. Just
not feasible.
The second
reality is you can’t divvy up warehouse data into mini warehouses. Doing so
forces you to decide in advance what queries will be asked. If you choose time,
then geography is a performance problem, if by type then time
is a performance problem.
In almost all
cases, the online transaction system must remain fast and reliable so every
effort is made to impact it as little as possible. One successful technique is
to precalculate and preaggregate a small standard set of queries and load that
data on the transaction system. This precludes complex and ad hoc queries, but
still provides immense value.
Another technique
is to limit analytical data to the time dimension. Data can sometimes be
distributed across multiple database instances. A thousands trade-offs are made,
for example, weighing refresh times, whether to put analytical data in with the
transaction data or in a separate instance. Backup and restore can be handled,
but the data currency is different for the two environments. How different is
part of the trade off analysis.
In a few
situations the amount of data was so large that putting data in a relational
database was cost prohibitive. In these cases the client resorted to creating
massive flat files.
A favorite
techniques is to toggle between multiple database servers, or multiple database
instances, or multiple database tables. This technique doubles or triples
refresh times and hardware costs, but it works.
|
Conference Session |
Speaker |
|
Getting the Rest of Your Organization Ready for XML |
Korki Whitaker, Progressive
Insurance |
Summary
by Arnie Hook
Ms. Whitaker presents advice concerning the introduction
of XML discovery activities, the employee indoctrination, and the needs for
training. The talk is based on experience gained at the Progressive Insurance
Co. where she is responsible for data-related teaching and development.
Korki’s advantage to XML usage is that Progressive’s
management understands the benefits and has a place for new technologies in
addition to allocating resources for its promotion.
The Data Engineering group led the activity with surveys
of management and explored software acquisition areas about XML tool requests.
They also examined projects for interfaces to internal and external systems.
They proactively got involved with detail requirements of projects.
Up-front analysis included documentation and highlight of
accomplishments with current projects. This was critical to show management of
progress and successes with XML. XML projects require a high-level management
sponsor in order to form a project and a development group (internal XML forum)
in alignment with business requirements. The XML forum is an established common
interest group with regular sessions and subcommittees.
Ms. Whitaker’s group developed a database of XML best
practices and a training program to extend knowledge throughout the
organization. The group objectives are equivalent to learning a new programming
language. A core group promoting and mentoring XML as a new technology benefit
automated project progress.
Korki’s experience and presentation sets-up any new XML
advocate with material for introducing XML, a new technology.
|
Conference Session |
Speaker |
|
Data Modeling Contentious Issues |
Karen Lopez, Principal
Consultant InfoAdvisors,
Inc. |
Summary
by Margaret O’Hara
This presentation was a highly interactive look at the
issues that people who subscribe to the e-mail, web, and newsgroup based
discussion groups have participated in. The format was simple: Karen presented
an issue, discussed it briefly and then asked the audience to vote on it. Then,
there was a brief discussion as to why the answers were what they were.
Voting was performed in an interesting manner. People in
the audience were given Post-It notes and they could stick them to one of 5
boards, depending on how strongly they felt about an issue. Not everyone had the
notes (the group was too large for that) but there were enough people with
voting ability to make the results interesting.
Among the issues discussed were: whether conceptual data
models were used (the results were evenly distributed on a 1-5 scale) whether a
good data model needed classwords (results were definitely skewed toward 1 for
always) and whether surrogate or natural keys were preferred (results were dead
center at 3). The surrogate key issue generated a great deal of discussion; it
was obvious that the audience felt very strongly about this issue.
The session pointed out two significant things. First,
even a group of data administrators and data managers cannot agree on
everything. Secondly, the voting method used was quite effective for taking a
quick pulse of a large group and can be used in other similar situations.
|
Conference
Session |
Speaker |
|
Data
Stewardship-Fact or Fiction? |
Diana
C. Young, President, Applied Information Strategies |
Summary by Linda Kresl
Diana began this presentation
by explaining the term data stewardship has been tossed around for the past
decade. Stewardship is … the recognition that all individual components of an
enterprise serve to ensure the future of the total organization.
The
main stewardship objective is to provide high quality information that meets the
needs of the business:
·
Getting
the Right Information
·