Monday, 7 November 2011

Lab 2 - XML Vs. HTML

Questions:

  1. Have a look at the documents below. Some are XML documents, some are HTML documents, some may be neither. See if you can decide which are which.
  
Document 1 is an HTML file which is indicated with the used of the <head> and <body> tags. Also it uses metdata which is used by search engines.

Document 2 is an XML file as it has a <DOCTYPE> which is an XML declaration to point to a DTD. Also it contains an XSL document linked to it which means that data in it will contain some formatting


Document 3 is an HTML file as it is also making use of the <head> tag. Also there is a declaration for an Input form, which will contain a drop down and a button. Also the document has some Javascript declared in the


Document 4 is an XML file again containg a <DOCTYPE> pointing to a dtd. This particular XML does not contain any XSL for formatting.

            Document 5 is again another XML file pointing to a DTD.


            Document 6 is yet another XML file pointing to a DTD.


Document 7 is an HTML file containing Javascript again having the same characteristics as the other HTML files


  1. Make a list of the distinctive characteristics of an XML document, in terms of things that you can spot when looking at the code.

  1. An XML document should have an XML declaration
  2. An XML document should be an assignment of a DTD or XSL schema
  3. All opening tags must be closed
  4. Attributes are placed within quotes
  5. An XML document contains only one root element. 

  1. Assume that (with suitable changes) all these documents could become XML documents. Imagine, and describe, applications that could use these documents.

Document 1 can hold a list of surgical procedures of Ancient Egyptian time

Document 2 is used for data and list regarding adverts

Document 3 is used to select files and load them.

Document 4 is holding all the information and structure of a story and/or a book

Document 5 is holding all the information and structure of a poem

Document 6 is storing specific information on every country

Document 7 holds various information about people and then specific information form it can be displayed depending on what the user chooses



Appendix: XML and HTML documents for identification:

NB I’ve removed one or more lines from each document to stop it being too easy.


<head>
<title>The Edwin Smith Surgical Papyrus</title>
<meta name="Keywords" content="Egypt, egypt, egypt travel, medical, Egyptian, egyptian, Papyrus, Edwin Smith, Surgical, literature, stories, instructions, Memphis, Nile, Cairo, Alexandria, Admonitions of Ipuwer, pharaoh,  pharaonic">
<meta name="Description" content="The Edwin Smith Surgical Papyrus on ancient Egyptian medical treatments">
</head>
<body background="Back25.jpg" bgcolor="#FFFFFF" text="#000000"
link="#808000" vlink="#008080">
<table border="0" cellspacing="1" width="565">
  <tr>
    <td><center>
<!-- START ADCYCLE IFRAME RICH MEDIA CACHE-BUST CODE for Top of Member Pages -->
<script language="javascript"><!--
var id=305; var jar=new Date();var s=jar.getSeconds();var m=jar.getMinutes();
var flash=s*m+id;var cgi='http://ads.touregypt.net/cgi-bin/adcycle';
var p='<iframe src="'+cgi+'/adcycle.cgi?gid=1&t=_top&id='+flash+'&type=iframe" ';
p+='height=60 width=468 border=0 marginwidth=0 marginheight=0 hspace=0 ';
p+='vspace=0 frameborder=0 scrolling=no>';
p+='<a href="'+cgi+'/adclick.cgi?gid=1&id='+flash+'" target="_top">';
p+='<img src="'+cgi+'/adcycle.cgi?gid=1&id='+flash+'" width=468 height=60 ';
p+='border=1 alt="Click to Visit"></a></iframe>'; document.write(p); // -->
</script><noscript><a href="http://ads.touregypt.net/cgi-bin/adcycle/adclick.cgi?gid=1&id=305" target="_top">
<img src="http://ads.touregypt.net/cgi-bin/adcycle/adcycle.cgi?gid=1&id=305" width=468 height=60 border=1></a></noscript>
<!-- END ADCYCLE IFRAME RICH MEDIA CODE -->
</center>
   </td>
  </tr>
</table>
<table border="0" width="570">
    <tr>
 <td>
 <p align="center">
 <font size="3">
 <b>The Edwin Smith Surgical Papyrus</b>
 </font>
 <p>
 <font size="3">
 The Edwin Smith Surgical Papyrus, dating from the seventeenth century B.C., is one of the oldest of all known medical papyri. Its differs fundamentally from the others in the following ways:
 </font>
 <font face="verdana,arial,helvetica" size="3">
 <P>
 </font>
 <ol>
 <li>
 <font size="3">
 The seventeen columns on the recto comprise part of a surgical treatise, the first thus far discovered in the ancient Orient, whether in Egypt or Asia. It is therefore the oldest known surgical treatise.
 </font>
 <li>
 <font size="3">



<!DOCTYPE advertisement SYSTEM "advertisement.dtd">
<?xml-stylesheet type="text/xsl" href="ad1.xsl" ?>
<advertisement action="update">
                <id version="2">
                                NYT.19980701.12345.107
                </id>
                <status value="accepted"></status>
                <expiration>
                                19980731
                </expiration>
                <reference>
                                Ad to sell Linda's car.
                </reference>
                <comment>
                                Up sold to add Friday repeat.
                </comment>
                <contact id="contact1">
                                <name>
                                                John Smyth
                                </name>
                                <address>
                                                <address_line>c/o Bat Accessories, Inc.</address_line>
                                                <address_line>Hitchcock Building, 80th Floor</address_line>
                                                <address_line>1313 Mockingbird Lane</address_line>
                                                <city>New York</city>,
                                                <state>NY</state> 
                                                <postal>10000-1234</postal>
                                                <country>USA</country>
                                </address>
                                <phone>
                                                19085551212
                                </phone>
                                <fax>
                                                19085551213
                                </fax>
                                <email>
                                                jsymth@batacc.com
                                </email>
                                <url>
                                                http://www.batacc.com/~smyth
                                </url>
                </contact>
                <source>
                                <updated>
                                                <timestamp>
                                                                19980701 12290200
                                                </timestamp>
                                                <userid>
                                                                JK1892
                                                </userid>
                                </updated>
                                <created>
                                                <timestamp>
                                                                19980701 12225800
                                                </timestamp>
                                                <userid>
                                                                JK1892
                                                </userid>
                                </created>
                                <base version="1">
                                                NYT.19980621.90810.98
                                </base>
                </source>
                <advertiser>
                                <account type="transient">
                                                19085551212-1
                                </account>
                                <contact_ref link="contact1"></contact_ref>
                                <payment>
                                                <charge>
                                                                <charge_card brand="amex"></charge_card>
                                                                <charge_account>3710-111111-99995</charge_account>
                                                                <charge_expiration>19991231</charge_expiration>
                                                                <contact_ref link="contact1"></contact_ref>
                                                                <charge_authorization status="allowed">4561</charge_authorization>
                                                </charge>
                                </payment>
                </advertiser>
                <coding>
                                <automotive>
                                                <auto_side value="sell">sell</auto_side>
                                                <auto_category value="used">used</auto_category>
                                                <auto_year>1991</auto_year>
                                                <auto_make>Saab</auto_make>
                                                <auto_model>900 Convertible</auto_model>
                                                <auto_mileage>72000</auto_mileage>
                                                <auto_price>$13,900</auto_price>
                                                <auto_exterior>white</auto_exterior>
                                                <auto_interior>gray leather</auto_interior>
                                                <auto_body value="convertible">convertible</auto_body>
                                                <auto_vin>372AB918098910X</auto_vin>
                                </automotive>
                                <contact>
                                                <name></name>
                                                <phone>19085551212</phone>
                                </contact>
                </coding>
                <text>
                                <font size="10">
                                                <center>
                                                                <keyword name="auto_make" punct=" ">SAAB </keyword>
                                                                <keyword name="auto_model" punct=" ">900SE </keyword>
                                                </center>
                                </font>
                                <keyword name="auto_year" punct=" ">1997 </keyword>
                                <keyword name="auto_exterior" punct=" ">yellow </keyword>
                                <keyword name="auto_body" punct=", ">convertible, </keyword>
                                <keyword name="auto_mileage" format="9'k miles'" scale="1000"
                                                                                                punct=", ">14k miles, </keyword>
                                Auto, PL, PW, AC, power leather Seats
                                Showroom cond. Assume lease.
                                <center>
                                                Call
                                                <keyword name="phone" format="T999-999-9999" punct=" ">
                                                                212-333-3333
                                                </keyword>
                                </center>
                </text>
                <publication name="nytimes">
                                <pub_alias>
                                                981011301
                                </pub_alias>
                                <pub_price>
                                                $128.00
                                </pub_price>
                                <pub_options>
                                                <claim>
                                                                7
                                                </claim>
                                                <columns>
                                                                1
                                                </columns>
                                                <forwarding collect="email">
                                                                Please email replies to <mailbox>T1234</mailbox>@nytimes.com
                                                                <rate basis="Email forwarding service charge--Full run"
                                                                                                unit="ad">$25.00
                                                                </rate>
                                                </forwarding>
                                                <tearsheet>
                                                                <rate basis="Tear sheet service charge" unit="recipient">$20.00</rate>
                                                </tearsheet>
                                                <shading>
                                                                <rate basis="Shading premium" unit="standard">20%</rate>
                                                </shading>
                                </pub_options>
                                <class>
                                                3720
                                                <title>Autos/Vans/Sports Utilities</title>
                                                <classword>Automotive</classword>
                                                <classword>For Sale</classword>
                                                <classword>Used</classword>
                                                <lines>
                                                                4
                                                </lines>
                                                <sortkey>
                                                                SAAB91900
                                                </sortkey>
                                                <zone>
                                                                M
                                                                <title>Full Run</title>
                                                </zone>
                                                <rundate>
                                                                19980719
                                                                <rate basis="Automotive, Open, Sunday NY Region"
                                                                                                unit="line">$23.10
                                                                </rate>
                                                                <instance>
                                                                                <edition>BASE</edition>
                                                                                <section>12</section>
                                                                                <page>22</page>
                                                                                <column>9</column>
                                                                                <offset>17.85</offset>
                                                                </instance>
                                                                <instance>
                                                                                <edition>LI</edition>
                                                                                <section>12</section>
                                                                                <page>18</page>
                                                                                <column>9</column>
                                                                                <offset>17.85</offset>
                                                                </instance>
                                                                <instance>
                                                                                <edition>NJ</edition>
                                                                                <section>12</section>
                                                                                <page>18</page>
                                                                                <column>9</column>
                                                                                <offset>17.85</offset>
                                                                </instance>
                                                                <instance>
                                                                                <edition>NY/LI</edition>
                                                                                <section>12</section>
                                                                                <page>18</page>
                                                                                <column>9</column>
                                                                                <offset>17.85</offset>
                                                                </instance>
                                                                <instance>
                                                                                <edition>WC</edition>
                                                                                <section>12</section>
                                                                                <page>18</page>
                                                                                <column>9</column>
                                                                                <offset>17.85</offset>
                                                                </instance>
                                                </rundate>
                                                <rundate>
                                                                19980724
                                                                <rate basis="Automotive, Open, Weekday NY Region--
                                                                                                Sunday ad repeated on Friday (within 7 days)"
                                                                                                unit="line">$8.90
                                                                </rate>
                                                                <rate basis="Automotive, Open, Weekday NY Region"
                                                                                                unit="line" type="comparison">$15.20
                                                                </rate>
                                                                <instance>
                                                                                <edition>METRO</edition>
                                                                                <section>6</section>
                                                                                <page>14</page>
                                                                                <column>6</column>
                                                                                <offset>5.15</offset>
                                                                </instance>
                                                </rundate>
                                </class>
                </publication>
</advertisement>




<head>
                <title>Display</title>

<script language="JavaScript">

function loadFile ()

                {

                var filename
               
                var selectionValue
               
                selectionValue = document.forms[0].selectList.selectedIndex
               
                filename = document.forms[0].selectList.options[selectionValue].value
               
                parent.rightFrame.location = filename           
               
                }



</script>

</head>

<body>

<H4>XML File Chooser</H4>

<P>Select the file you wish to see displayed in the right-hand frame.

<FORM NAME="selectForm">

<P>

<SELECT NAME="selectList">

<OPTION VALUE="countryList/countryList.xml"> Country Data

<OPTION VALUE="playerList/playerList.xml"> Baseball Player Data

</SELECT>

<P>

<INPUT TYPE="BUTTON" VALUE="Load Document" onClick="loadFile()">

</FORM>

</body>




<!DOCTYPE story SYSTEM "storyxsl.dtd">
<story>
<title>Freedom's Dream</title>
<author>by Charles White</author>
<copyright>Copyright 1996, 1999 by Charles White</copyright>
<section>
<para>Had it been a dream, Antron Crimea's memory of the clenched fist piercing the
sky of a tumultuous, thundering crowd would have been bearable solitude. As
it was though, the reality brought him to another place, to a distance only
something like a dream could take him.</para>
<para>&quot;The crowd forgot everything,&quot; is how Antron described the situation to his
psychiatrist, <link id="ChesapeakeLink">
Chesapeake Alert.</link>
Antron remembered the rhythm, the pulse,
everything. After all this time the energy of the crowd still seemed to reverberate through
his head.</para>
<para>Chesapeake Alert was nothing but a large bulbous mass of jelly-like flesh; a
brain plopped down on an empty, expensive slice of carpet. And though he
had no legitimate locomotive capabilities of his own, he was aware of the
movements of a billion others.</para>
<para>Antron's hundred legs crawled around what was left of the carpet in the kind of
pace unknown to you or I. His earlier confusion had long ago been dissolved
by the righteous events of what he had seen during the course of events Billy
Freedom had ignited.</para>
<para>&quot;Sometimes betrayal is a necessity,&quot;said Chesapeake. &quot;Startling. And
expensive. It must be weighed carefully.&quot;</para>
</section>
<auto-link
                xml:link="simple"
                actuate="user"
                href="sec_2.xml"
                show="replace">click here to continue</auto-link>
</story>


<!DOCTYPE my.dtd [
    <!ELEMENT anthology      - -  (poem+)>
    <!ELEMENT poem           - -  (title?, stanza+)>
    <!ELEMENT title          - O  (#PCDATA) >
    <!ELEMENT stanza         - O  (line+)   >
    <!ELEMENT line           O O  (#PCDATA) >
]>
<my.dtd>
<anthology>
         <poem><title>The SICK ROSE
         <stanza>
              <line>O Rose thou art sick.
              <line>The invisible worm,
              <line>That flies in the night
              <line>In the howling storm:
         <stanza>
              <line>Has found out thy bed
              <line>Of crimson joy:
              <line>And his dark secret love
              <line>Does thy life destroy.
          <poem>
              <!-- more poems go here    -->
 
    </anthology>
</my.dtd>



<!DOCTYPE countryCollection SYSTEM "countryList.dtd">
<countryList>
                <country>
                                <officialName>United States of America</officialName>
                                <label>Common Names:</label>                  
                                <commonName>United States</commonName>
                                <commonName>U.S.</commonName>
                                <label>Capital:</label>                
                                <capital>Washington, D.C.</capital>
                                <label>Major Cities:</label>                                                         
                                <majorCity> Los Angeles </majorCity>
                                <majorCity> New York </majorCity>                             
                                <majorCity> Chicago </majorCity>                               
                                <majorCity> Dallas </majorCity>                  
                                <label>Bordering Bodies of Water:</label>                                                     
                                <borderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater>
                                <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>                   
                                <borderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater>  
                                <label>Bordering Countries:</label>                                              
                                <borderingCountry> Canada </borderingCountry>                                                           
                                <borderingCountry> Mexico </borderingCountry>
</country>
                <country>
                                <officialName> Japan </officialName>
                                <label>Common Names:</label>                                                  
                                <commonName> Japan </commonName>
                                <label>Capital:</label>                
                                <capital>Tokyo</capital>
                                <label>Major Cities:</label>                                                         
                                <majorCity> Nagoya </majorCity>
                                <majorCity> Osaka </majorCity>                  
                                <majorCity> Kobe </majorCity>                    
                                <label>Bordering Bodies of Water:</label>
                                <borderingBodyOfWater> Sea of Japan </borderingBodyOfWater>
                                <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>                   
                </country>
                <country>
                                <officialName> Republic of Kenya </officialName>
                                <label>Common Names:</label>                                                  
                                <commonName> Kenya </commonName>
                                <label>Capital:</label>                
                                <capital> Nairobi </capital>
                                <label>Major Cities:</label>                                                         
                                <majorCity> Mombasa </majorCity>
                                <majorCity> Lamu </majorCity>
                                <majorCity> Malindi </majorCity>                 
                                <majorCity> Kisumu </majorCity>                
                                <label>Bordering Bodies of Water:</label>
                               
                                <borderingBodyOfWater> Indian Ocean </borderingBodyOfWater>
                </country>
</countryList>


<head>
<script LANGUAGE="JavaScript">
//The global variable containing the XML string we'll examine. Normally, global variables are to be avoided. But here, it's the easiest way to
//approach the problem, since it would be easy to have a server-side script include the contents of the XML file as a single line here.
gXMLString = " <officialName> United States of America </officialName> <commonName> United States </commonName> <commonName> U.S. </commonName> <capital> Washington, D.C. </capital> <majorCity> Los Angeles </majorCity> <majorCity> New York </majorCity> <majorCity> Chicago </majorCity> <majorCity> Dallas </majorCity> <borderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater> <borderingBodyOfWater> Pacific Ocean </borderingBodyOfWater> <borderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater> <borderingCountry> Canada </borderingCountry> <borderingCountry> Mexico </borderingCountry>"

function findTagsPresent()
{
var arrayOfPieces = new Array()
arrayOfPieces = gXMLString.split(" ")
numberOfPieces = arrayOfPieces.length
var tagsPresent = new Array()
var tagsPresentCounter
tagsPresentCounter = 0
for (i=0; i<numberOfPieces; i++)
                {
               
                if ((arrayOfPieces[i].indexOf("<") == 0) && (arrayOfPieces[i].indexOf(">") == (arrayOfPieces[i].length-1)) && (arrayOfPieces[i].indexOf("</") == -1))
                                // If that's the case, then we've found an opening tag.
                                {
                                var arrayLength
                                arrayLength = tagsPresent.length
                                var foundIt
                                foundIt = false
                                for (j=0; j<arrayLength; j++)
                                                {
                                                                if (tagsPresent[j] == arrayOfPieces[i])
                                                                                {
                                                                                foundIt = true            
                                                                                break
                                                                                }
                                                }
                                if (foundIt != true)
                                                //And if that's the case, it's not already in tagsPresent
                                                {              
                                                tagsPresent[tagsPresentCounter] = arrayOfPieces[i]
                                                tagsPresentCounter++
                                                }
                                }
                }
return tagsPresent
}
function writeListOfTagsPresentWithCheckboxes()
{
var listOfTags
listOfTags = findTagsPresent()
var listLength
listLength = listOfTags.length
var numberOfCheckBoxes
numberOfCheckBoxes = 0
for (i=0; i<listLength; i++)
                {
                var tagStringLength
                tagStringLength = listOfTags[i].length
                var strippedTagString
                strippedTagString = listOfTags[i].substring(1, (tagStringLength-1))
                document.write("<BR>")
                document.write("<INPUT TYPE='checkbox' NAME='box" + i + "' VALUE='" + strippedTagString + "'> &nbsp;")
                document.write(strippedTagString)
                numberOfCheckBoxes++
                }
document.write("<P>")
document.write("<INPUT TYPE='button' value='Display' onClick='displaySelectedXMLData(" + numberOfCheckBoxes + ")'>")
}              
function contentsTaggedThisWay(tagString)
{
var arrayOfPieces = new Array()
arrayOfPieces = gXMLString.split(" ")
var numberOfPieces
numberOfPieces = arrayOfPieces.length       
var taggedData
taggedData = ""
var i
i = 0
while (i<numberOfPieces)
                {              
                if (arrayOfPieces[i] == ("<" + tagString + ">"))
                                {
                                var foundEndTag
                                taggedData += "<BR>"
                                foundEndTag = false
                                var j
                                j = 1
                                while (!(foundEndTag))
                                                {
                                                if (arrayOfPieces[(i + j)] == ("</" + tagString + ">"))
                                                                {                              
                                                                foundEndTag = true                    
                                                                }                              
                                                                                else                                         
                                                                                                {                                              
                                                                                                taggedData += arrayOfPieces[(i + j)]                                             
                                                                                                taggedData += " "
                                                                                                j++
                                                                                                }
                                                }
                                }
                i++
                }              
return taggedData      
}
function displaySelectedXMLData(numberOfBoxes)
{
var stringToWrite
stringToWrite = ""
parent.rightFrame.location.reload()
stringToWrite = "<HTML> <HEAD> </HEAD> <BODY>"
var i
i=0
while (i<numberOfBoxes)
                {
                currentBoxName = "box" + i
                if (document.selectionForm.elements[currentBoxName].checked)
                                {
                                stringToWrite += "<P><B>" + document.selectionForm.elements[currentBoxName].value + "</B>"
                                stringToWrite += contentsTaggedThisWay(document.selectionForm.elements[currentBoxName].value)
                                }
                i++
                }
stringToWrite += "</BODY> </HTML>"
parent.rightFrame.document.write(stringToWrite)
}              
</script>
</head>
<body>
<H4>Data Chooser</H4>
<P>Choose the tags whose data you want to display.
<FORM NAME="selectionForm">
<SCRIPT LANGUAGE="JavaScript">
writeListOfTagsPresentWithCheckboxes()
</SCRIPT>
<P>
</FORM>
</body>

No comments:

Post a Comment