(3.128.198.60)
Users online: 5059     
Ijournet
Email id
 

International Journal of Engineering and Management Research (IJEMR)
Year : 2017, Volume : 7, Issue : 5
First page : ( 52) Last page : ( 56)
Print ISSN : 2394-6962. Online ISSN : 2250-0758.

Near Duplicate URL Detection for Removing Dust Unique Key

Santhi R. Vijaya

Research Scholar, Department of Computer Science, Tamil University College, Thanjavur, Tamil Nadu, India

Online published on 8 December, 2017.

Abstract

Regular parallel mining algorithms for mining frequent item sets intends to balance load by equally partitioning data among a group of computing nodes. But those existing parallel Frequent Item set Mining algorithms has serious performance issues. In big data environment existing mining algorithm suffer high communication and mining overhead induced by redundant data transmitted among computing nodes. We explore this problem by developing a data partitioning approach using the Map Reduce programming model. The aim of this paper is to enhance the performance of parallel Frequent Item set mining on Hadoop clusters. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, in this proposed model VUK (Valid Unique Key) DUST removing technique LDA-CRATS mining data is used to run this approach. This approach is to derive quality rules that take advantage of a multi-sequence alignment strategy. It demonstrates that a full multi-sequence alignment of URLs with duplicated content, before the generation of the rules, can lead to the deployment of very effective rules. By evaluating this method, it observed it achieved larger reductions in the number of duplicate URLs than our best baseline, with gains of 85 to 150.76 percent in two different web collections.

Top

Keywords

MapReduce, URL duplicated, multi-sequence, Hadoop.

Top

  

Access denied

Your current subscription does not entitle you to view this content or Abstract is unavailable, the access to full-text of this Article/Journal has been denied. For Information regarding subscription please click here.

For a comprehensive list of other publications available on IJour.net please click here

or, You can subscribe other items from IJour.net (Click here to see other items list.)

Top

║ Site map ║ Privacy Policy ║ Copyright ║ Terms & Conditions ║ Page Rank Tool
750,451,910 visitor(s) since 30th May, 2005.
All rights reserved. Site designed and maintained by DIVA ENTERPRISES PVT. LTD..
Note: Please use Internet Explorer (6.0 or above). Some functionalities may not work in other browsers.