XPath for HTML markup

XPath

Web Scraping

Web Scraping is a technique to traverse the DOM (Document Object Model) of an HTML or an HTTP web page Web Scraping is achieved due to XPath

XPath nodes:
There are seven kinds of nodes
  1. Element - It represents any HTML 5 element. For example, <strong></strong>
  2. Attribute - It represents any one attribute of any HTML 5 element in the document object model
  3. Text - It represents the text between the opening HTML 5 tag and a closing HTML 5 tag
  4. Namespace - It represents the pseudo selector of an HTML 5 element
  5. Processing-Instructions
  6. Comment - It represents any HTML 5 comment
  7. It represents the topmost element of the tree is called the root element. For example, the root element for any HTNL 5 document is HTML
The first XPath node is \ which Suppose, I have an HTML5 code snippet as follows:
  • Gaurav
  • Shirodkar
Then to find the 2nd list item from the unordered list XPath is /div/ul/li[1] The XPath is traversed from left to right. The first / denotes the root <html> tag then, div represents the <div> child element of the root element Then the /ul denotes the first unordered list in the div tag Then, the /li[1] denotes the second list item child of the unordered list which is its parent node in the Document Object Model(DOM)

Comments

Popular posts from this blog

Parallel Database design, query processing

Laravel | PHP | Basics | Part 2

Apache Hadoop | Running MapReduce Jobs