Near Duplicate URL Detection for Removing Dust Unique Key

(3.128.198.60)

Users online: 5059

Ijournet

Email id

FREE

Sample Issue

Trial Access

International Journal of Engineering and Management Research (IJEMR)
Year : 2017, Volume : 7, Issue : 5
First page : ( 52) Last page : ( 56)
Print ISSN : 2394-6962. Online ISSN : 2250-0758.

Near Duplicate URL Detection for Removing Dust Unique Key

Santhi R. Vijaya

Research Scholar, Department of Computer Science, Tamil University College, Thanjavur, Tamil Nadu, India

Online published on 8 December, 2017.

Abstract

Regular parallel mining algorithms for mining frequent item sets intends to balance load by equally partitioning data among a group of computing nodes. But those existing parallel Frequent Item set Mining algorithms has serious performance issues. In big data environment existing mining algorithm suffer high communication and mining overhead induced by redundant data transmitted among computing nodes. We explore this problem by developing a data partitioning approach using the Map Reduce programming model. The aim of this paper is to enhance the performance of parallel Frequent Item set mining on Hadoop clusters. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, in this proposed model VUK (Valid Unique Key) DUST removing technique LDA-CRATS mining data is used to run this approach. This approach is to derive quality rules that take advantage of a multi-sequence alignment strategy. It demonstrates that a full multi-sequence alignment of URLs with duplicated content, before the generation of the rules, can lead to the deployment of very effective rules. By evaluating this method, it observed it achieved larger reductions in the number of duplicate URLs than our best baseline, with gains of 85 to 150.76 percent in two different web collections.

Top

Keywords

MapReduce, URL duplicated, multi-sequence, Hadoop.

Top

Access denied

Your current subscription does not entitle you to view this content or Abstract is unavailable, the access to full-text of this Article/Journal has been denied. For Information regarding subscription please click here.

For a comprehensive list of other publications available on IJour.net please click here

or, You can subscribe other items from IJour.net (Click here to see other items list.)

Top

║ Site map ║ Privacy Policy ║ Copyright ║ Terms & Conditions ║

750,451,910 visitor(s) since 30^th May, 2005.

Note: Please use Internet Explorer (6.0 or above). Some functionalities may not work in other browsers.

Agriculture
Applied Science/Technology
Biology
Botany
Business/Economics/Management
Chemistry
Civil Engineering
Commerce/Banking/Finance
Computers/Information Technology
Dental Science
Earthscience
Education
Engineering Mechanics/Materials
Environment
Health Science
Humanities
Library and Information Science
Management
Mathematics/Statistics
Medical Science
Nanotechnology
Nursing
Pharmacy
Physics
Social Science
Veterinary/Animal Sciences