Abstract:
When deal with semistructured data such as that available on the Web,it becomes important to infer the inherent structure,both for user(to facilitate querying) and for the system(to optimize access).The problem is considered of identifying some underlying structure in large collections of semistructured data.Since expecting the data to be fairly irregular,this structure consists of an approximate classification of objects into a hierarchical collection of types.A notion is of proposed a type hierarchy for such data,an algorithm for deriving the type hierarchy,and rules for assigning types to data elements.