XPath for HTML markup

XPath

Web Scraping

Web Scraping is a technique to traverse the DOM (Document Object Model) of an HTML or an HTTP web page Web Scraping is achieved due to XPath

XPath nodes:
There are seven kinds of nodes
  1. Element - It represents any HTML 5 element. For example, <strong></strong>
  2. Attribute - It represents any one attribute of any HTML 5 element in the document object model
  3. Text - It represents the text between the opening HTML 5 tag and a closing HTML 5 tag
  4. Namespace - It represents the pseudo selector of an HTML 5 element
  5. Processing-Instructions
  6. Comment - It represents any HTML 5 comment
  7. It represents the topmost element of the tree is called the root element. For example, the root element for any HTNL 5 document is HTML
The first XPath node is \ which Suppose, I have an HTML5 code snippet as follows:
  • Gaurav
  • Shirodkar
Then to find the 2nd list item from the unordered list XPath is /div/ul/li[1] The XPath is traversed from left to right. The first / denotes the root <html> tag then, div represents the <div> child element of the root element Then the /ul denotes the first unordered list in the div tag Then, the /li[1] denotes the second list item child of the unordered list which is its parent node in the Document Object Model(DOM)

Comments

Popular posts from this blog

Apache Hadoop | Running MapReduce Jobs

Laravel | PHP | Basics | Part 2