Posts

XPath for HTML markup

XPath Web Scraping Web Scraping is a technique to traverse the DOM (Document Object Model) of an HTML or an HTTP web page Web Scraping is achieved due to XPath XPath nodes: There are seven kinds of nodes Element - It represents any HTML 5 element. For example, <strong></strong> Attribute - It represents any one attribute of any HTML 5 element in the document object model Text - It represents the text between the opening HTML 5 tag and a closing HTML 5 tag Namespace - It represents the pseudo selector of an HTML 5 element Processing-Instructions Comment - It represents any HTML 5 comment It represents the topmost element of the tree is called the root element. For example, the root element for any HTNL 5 document is HTML The first XPath node is \ which Suppose, I have an HTML5 code snippet as follows: Gaurav Shirodkar Then to ...

Parallel Database design, query processing

Image
Parallel Database design, query processing, and case study In this post, we are going to analyze the concepts of Parallel Database design, query processing, and case study. Parallel database design Hardware Architecture Data partitioning Query processing Hardware Architecture Shared Memory Shared Disk> Shared Nothing Data Partitioning Partitioning a relation involves distributing its tuples over several disks Three Kinds – Range Partitioning Round-robin Partitioning Hashing Partitioning Range Partitioning is good for Ideal for point and range queries on the partitioning attribute Hash partitioning Ideal for point queries based on the partitioning attribute Ideal for sequential sca...

PHP Topics | Latest Features | Lesser known Facts | Important for Entrance Exams

PHP Topics | Latest Features | Lesser known Facts | Important for Entrance Exams In this post, I am going to add my other personal projects that I have worked on in various technologies Reflection in PHP5 $class = new \ReflectionClass(MyClass’); $method = new \ReflectionMethod('MyClass',’method’); $args = array(); $method->invoke($method->getDeclaringClass()->newInstanceArgs($args)); Traits in PHP A Trait is similar to a class, but only intended to group functionality in a fine-grained and consistent way. It is not possible to instantiate a Trait on its own. It is an addition to traditional inheritance and enables horizontal composition of behavior; that is, the application of class members without requiring inheritance. Avoids the problems of Multiple inheritance. Is a parent constructor called implicitly in PHP? When a constructor is defined for a class in PHP, the parent construct...

Apache Hadoop | Running MapReduce Jobs

Image
Apache Hadoop | Running MapReduce Jobs After setting up your environment and running the HDFS and YARN daemons, we can start working on running MapReduce jobs on our local machine. We need to compile our code, produce a JAR file, move our inputs, and run a MapReduce job on Hadoop. Step 1 - Configure extra environment variables As a preface, it is best to setup some extra environment variables to make running jobs from the CLI quicker and easier. You can name these environment variables anything you want, but we will name them HADOOP_CP and HDFS_LOC to not potentially conlict with other environment variables. Open the Start Menu and type in 'environment' and press enter. A new window with System Properties should open up. Click the Environment Variables button near the bottom right. HADOOP_CP environment variable This is used to compile your Java files. The backticks (eg. `some command here`) do not work on Win...

Apache Hadoop Prerequisites and Installation

Image
Apache Hadoop Prerequisites and Installation The following files are needed for any Hadoop project: Please follow the steps given in the following link for seting up Hadoop on Windows 10 machines How to set up Hadoop on Windows 10 hadoop-hdfs-3.3.4.jar (Java Archive File) zookeeper-3.6.4.jar (Java Archive File) log4j-1.2-api-2.19.0.jar (Java Archive File) hadoop-mapreduce-client-core-3.3.4.jar (Java Archive File) hbase-0.92.1.jar (Java Archive File) hadoop-common-3.3.4.jar (Java Archive File) AN IDE(Integrated DEvelopment Environment) like Eclipse Click on the perspective menu in the menu bar and select Mapreduce as the perespective After installing successfully Hadoop, we have a directory structure as given below in the hadoop-3.2.1 directory Eclipse IDE Setup and initialization Configuration The binary executables to start the name node, data nodes, mapreduce sites,...

Apache Hadoop | Use Case | Association Rule Mapping | Part 1

Apache Hadoop | Use Case | Association Rule Mapping | Part 1 The classes required to run a Hadoop application: org.apache.zookeeper.*; org.apache.log4j.*; org.apache.hadoop.conf.Configuration; org.apache.hadoop.fs.Path; org.apache.hadoop.hbase.*; org.apache.hadoop.hbase.HBaseConfiguration; org.apache.hadoop.hbase.client.HTable; org.apache.hadoop.hbase.client.Put; org.apache.hadoop.hbase.client.Get; org.apache.hadoop.hbase.util.Bytes; org.apache.hadoop.hbase.client.Result; org.apache.hadoop.hbase.client.ResultScanner; org.apache.hadoop.hbase.client.Scan; org.apache.hadoop.io.IntWritable; org.apache.hadoop.io.LongWritable; org.apache.hadoop.io.ObjectWritable; org.apache.hadoop.io.Text; org.apache.hadoop.mapreduce.Job; org.apache.hadoop.mapreduce.Mapper; org.apache.hadoop.mapreduce.Reducer; org.apache.hadoop.mapreduce.lib.input.FileInputFormat; org.apache.hadoop.mapreduce.lib.input.TextInputFormat; org.apache.hadoop.mapreduce.lib.o...

Apache Hadoop | Implementation

Image
Starting with Apache Hadoop System analysis and Design This article explains how the system is analyzed to carry out the work for the proposed system. System analysis is the process of gathering and interpreting facts, diagnosing problems, and using the facts to improve the system. System analysis does more than just solve the current problem especially when there is no such system exists that is going to be developed. The future needs of the business and the changes required to meet the needs are analyzed. Once the decision is made, the plan is developed to implement the recommendations. The plan includes all system design features, such as new data capture needs(storage system), operating systems, equipment, and personal needs. The system design is like a blueprint: it specifies all the features that are to be in the finished product. Class diagram The class diagram is static. It represents the static view of an application. It...

Starting with Apache Hadoop

Image
Starting with Apache Hadoop In Hadoop, a single master is managing many slaves The master node consists of a JobTracker , Tasktracker , NameNode , and DataNode . A slave or worker node acts as both DataNode and TaskTracker though it is possible to have data-only worker node, and compute-only workerNodes. NameNode holds the file system metadata. The files are broken up and spread over the DataNode and JobTracker schedules and the manager's job. The TaskTracker executes the individual map and reduced function. If a machine fails, Hadoop continues to operate the cluster by shifting work to the remaining machines. The input file, which resides on a distributed file system throughout the cluster, is split into even-sized chunks replicated for fault tolerance. Haddopp divides each map to reduce jobs into a set of tasks. Each chunk of input is processed by a map task, which outputs a list of key-value pairs. In Hadoop, the shuffle phase o...

Handling of Big Data on the Internet

Image
Handling of Big Data on the Internet When we handle text files that are about 4000-5000 lines long, it is difficult, time-consuming, Input/Output overhead, non-scalable, hardware fault, unnecessary repetition of code, loss of memory space, difficult to process errors and prevent error propagation, etc. The term "Big Data" applies to the above data. Suppose I have a file that is in the English language and is about 4000-5000 lines long with an average of 100 characters per line with a few anomalies as blank lines Heuristics The heuristics we are going to sue are as follows: Articles (a, an, the) Prepositions (of, at, about, around, besides, aside, above, over) Conjunctions (and, between, or, because, hence, since, although, though, not only, but also, but, so, therefore) Adverbs ( adverbs are easy to recognize as they mostly have "ly" as their suffix Pronouns (I, he, we, our, their, he, she, it) ...