Average Reviews:
(More customer reviews)This is a unique book in that it focuses on just one aspect of database design: the individual data item (AKA element, AKA field, AKA column).
I first read Fourth Generation Data when I was a relative newcomer to data modeling and database design. I loved it. Without success, I tried to get a copy of my very own. Now, after over twelve more years in the field, I have with the help of Amazon finally obtained that copy of my own-and the anticipated reacquaintance.
It has reminded me of my debt to the book. I had come to regard as my own a couple of ideas that are actually the brainchildren of Mr. Tasker. But time alas has changed us both, and I am no longer able to give it the unqualified affection of my younger days.
Still it was a profitable reunion. For you too it will be profitable if data modeling counts among your interests. And nostalgic if you've had that interest as long as I.
The very title inspires nostalgia. Remember 4GLs? Programming languages have evolved much since the appearance of those styled "fourth generation," but we have yet to hear of a "fifth generation." Faster metaphors are needed now. At any rate, Mr. Tasker believed that the advent of non-professional programmers using fourth generation programming languages imposed a streamlined design on data items. Hence the title.
The needs that fourth generation programming languages tried to address are today more effectively addressed by the technology of data warehousing-where Mr. Tasker's insights on design are even more critical. But less topical. His arguments-and those of like-minded practitioners-have won out. No one today seriously disputes them.
It was not always so. Chapter 2 catalogs a few of the errors once commonly found in the designs of data items. Techniques employed by earlier generations to conserve precious hardware resources-at the expense of complicating the data. Techniques to be abandoned in the generation of cheap disks and expensive programmers.
Mr. Tasker elaborates his second key premise in Chapter 3: every data item may be classified as either a Label (includes names and id numbers), a Quantity (includes dates and times), or a Description (includes text, graphic, audio, video-everything else). He dedicates each of the next three chapters to a detailed discussion of one of these classes, building entity-relationship meta-models of the facts pertinent to data items of each class.
Chapter 7 shows how the kinds of validation required by a data item will depend on its classification. It also presents a consolidated meta-model of data item analysis. The basic discussion of data item classes finishes in Chapter 9 with a thorough case study applying the analysis model to the data items on a loan application form.
The rest-Chapters 10-17-are dedicated to "special topics." The most interesting of these are data models of the components of names and addresses, the use of access keys, units of measure, and the naming of data items (my personal favorite).
Mr. Tasker unfortunately does not clearly define the essential concept in his analysis-Data Item. Chapter 1 explains that this term is chosen to describe "the smallest unit of data of direct interest to an enterprise (p. 10)." It is regarded as a synonym for any one of the following: field, data element, attribute, column. In Chapter 5, however, it becomes clear that another term-Data Item in Data Group-is intended to fill this role. Data Item is actually a more abstract concept that may be instantiated in each of one or more Data Groups (i.e. Records) as a Data Item in Data Group.
So the common term for Data Item is Domain: " . . . DATA ITEM, its subtypes, and several related entities more concisely capture the information that is intended by the rather loosely defined concept of DOMAIN (p. 82)." Mr. Tasker believes that all of the loose definitions center around one theme: "a domain is intended to represent information related to establishing and maintaining valid instances of a data item . . . (p. 67)."
It remains unclear whether Domains and Data Items must have one instantiating Data Item in Data Group to define the valid instances. Mr. Tasker speaks in one place of "two columns based on the Salary domain (p. 84)," implying that the answer is no. Yet, his specific examples of Data Item all involve a defining Data Item in Data Group. If the answer really is no, then why does he speak of Employee Telephone Number as a distinct Data Item? Why not just Telephone Number? He claims that "Employee" provides a context that "must be understood in order to fully understand the individual data items (p. 186)." Presumably, then, Customer Telephone Number would be a distinct Data Item.
Mr. Tasker's tripartite classification of Data Items is also flawed. Although occasionally acknowledging that quantities have "certain label . . . aspects (p. 54)," it misses the fundamental unity of labels and quantities. Assigning a value of "Red" to an automobile to indicate its color is no different from assigning "2,500" to indicate its weight. "2,500" names a number just as "Red" names a color. The only difference is that the name "2,500" was invented to facilitate computation, whereas the name "Red" was invented to facilitate pronunciation. It is interesting to speculate whether Mr. Tasker's classification would differ if "Twenty-five hundred" or "MMD" had been recorded instead.
Flaws notwithstanding, Fourth Generation Data starts you thinking about things that other books ignore. It is recommended as much for the ideas that it triggers as for the good ideas that it contains.
Click Here to see more reviews about: Fourth Generation Data: A Guide to Data Analysis for Old and New Systems
0 comments:
Post a Comment