Document WIS of a department of a univarisity

The below figure shows another example of a department homepage of a university, including a front page and its result pages. This example clarifies how semistructured data are represented by our data model.



Let us show how to wrap this page. This WIS is assumed to be a structured document, whose DTD and an equivalent schema in our data model are given in the below figure.



In order to explain interface definitions, let us use the interface definition of Department shown as follows:

database ISInformation
address http://HOST2/cgi-bin/WrapperExecutor.pl
interface Department
body
   url			 URLDomain,
   academicPage	   link	 Academic,
   deptName		 Department,
   laboratories    setof Laboratory,
method
public:
   static getByURL(URLDomain $url);
   static getByKeyWord(String $keyword);
private:
   static new(Department $dept,OID $labOID);
   printDetail();
implement
sub new {
  my $package  = shift;
  my $deptName = shift;
  my $labOID = shift;
  my $this;
    
  $this = newEmptyObject Department();
  $this->{deptName} = $depName;
  push(@{$this->{laboratories}},$labOID);
  bless $this;
  return $this;
}

sub getByURL{
  my $package = shift;
  my $url = shift;
  my $resultPage;	
  my $resultPageFile = 'S_Major.html';
	
  open(S,">$resultPageFile");
  $resultPage = modifyHTMLContent($url);
  print S $resultPage; 
  close S;
  return $resultPageFile;
}

sub getByKeyWord{
  my $package = shift;
  my $keyword = shift;
  my $url1 = 'http://hercules.hol.is.uec.ac.jp:8888/jua/IS/s_major.html';
  my $url2 = 'http://hercules.hol.is.uec.ac.jp:8888/jua/IS/n_major.html';
  my $pageContent;
  my @url = ($url1,$url2);
  my $resultContent;
  my $resultFName = 'is_page.html';

  foreach $url (@url){
    $pageContent = modifyHTMLContent($url);
    if ($pageContent =~ /$keyword/){
       $resultContent .= $pageContent;
       $resultContent =~ s/$keyword/$keyword<\/A>/g;
    }
  }
  $resultContent =~   s/(Jump to result<\/A>\n$1/g;
  open(S,">$resultFName");
  print S $resultContent;
  close S;
  return $resultFName;
}

sub printDetail{
  my $this = shift;

  print "Departmant = $this->{deptName}\n";
  foreach (@{$this->{laboratories}}){
     $_->printDetail();
  }
}

endinterface

This interface definition has four attributes: url, academicPage, deptName , laboratories. The second atrribute is a single-value to represent a link between the Department page and Acedemic page. The third attribute is a single-value of string and its domain is DeptDomain. The fourth attribute is a multi-value keeping OIDs of instances of a Laboratory interface. Using such a multi-valued attribute, we can represent nested structure of documents. The first attribute url is an additional element keeping the URL of the original document page.

In the method part, there are two types of methods; a public type and a private type. Only public methods are invoked explicitly. In this example, getByURL() method is the way to retrieve an original document having a given URL and getByKeyWord() method is the way to retrieve an original document having a given keyword string. The new() is a class method used to created Department object and printDetail() is a instance method used to print detail of information in each object. Below are the interface definition for a laboratory element and a staff element in the page.

database ISInformation
address http://HOST2/cgi-bin/WrapperExecutor.pl
interface Laboratory
body
   labName		 LabDomain,
   staffs                setof Staff,
method
public:
private:
   ....
implement
  ...
endinterface


database ISInformation
address http://HOST2/jua/cgi-bin/WrapperExecutor.pl
interface Staff
body
   position		Position,
   fullName		Staff,
method
public:
private:
   ....
implement
   ....
endinterface


Finally, the below figure shows the instance of a department object extracted from a department page. It is a complex object consisting of multiple simple objects. The structure of page is mapped to the structure of a department object.




Last modified: Mon Apr 24 13:21:25 JST 2000