Binary Journal of Data Mining & Networking
http://arjournals.org/index.php/bjdmn
Vol-<strong>2</strong>; Issue<strong> 2</strong>; Page <strong>01-28</strong><br />ISSN : <strong>2229 – 7170 </strong>ICV<strong>: 5.09</strong><br />Editor-in-Chief : <strong>Dr. John William</strong>en-USBinary Journal of Data Mining & Networking2229–7170<h4><span>Authors who publish with this journal agree to the following terms:</span></h4><ol type="a"><br /><li>Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a <a href="http://creativecommons.org/licenses/by/3.0/" target="_new">Creative Commons Attribution License</a> that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.</li><br /><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><br /><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol>Time-Series Data Mining:A Review
http://arjournals.org/index.php/bjdmn/article/view/1645
Data mining refers to the extraction of knowledge by analyzing the data from different perspectives and accumulates them to form useful information which could help the decision makers to take appropriate decisions. Classification and clustering has been the two broad areas in data mining. As the classification is a supervised learning approach, the clustering is an unsupervised learning approach and hence can be performed without the supervision of the domain experts. The basic concept is to group the objects in such a way so that the similar objects are closer to each. Time series data is observation of the data over a period of time. The estimation of the parameter, outlier detection and transformation of the data are some ofthe basic issues in handling the time series data. An approach is given for clustering the data based on the membership values assigned to each data point compressing the effect of outlier or noise present in the data. The Possibilistic Fuzzy C-Means (PFCM) with Error Prediction (EP) are done for the clustering and noise identification in the time-series data.Suman H. PalJignasa N. Patet
Copyright (c)
51010410.5138/bjdmn.v5i1.1645Mathematics Problem Solving using Metacognition Aspect
http://arjournals.org/index.php/bjdmn/article/view/1636
If students are to excel on both the routine mathematics skills and the problem-solving skills, teachers must place emphasis on both the mathematical contents and the mathematical processes in the teaching and learning of mathematics. This paper presents the theoretical rationale and the importance of metacognition to the learning of mathematics. A project was conducted on students of around sixteen years of age and the findings indicated that students did employ the four phases of problem solving emphasized by George Polya. However, students fared better when they regulated their thinking process or employed metacognitiveskills in the process of solving mathematics problems. This paper also suggests the strength of a mixed methodology in doing research by expanding an understanding from one methodology to another and converging findings from different data sources.Prem Pratap SinghAmit Sisodiya
Copyright (c)
51050710.5138/bjdmn.v5i1.1636Data mining intelligent system for decision making based on ERP
http://arjournals.org/index.php/bjdmn/article/view/1637
As Enterprise Resource Planning (ERP) implementation has become more popular and suitable for every business organization, it has become a essential factor for the success of a business. This paper shows the best integration of ERP with Customer Relationship Management (CRM). Data Mining is overwhelming the integration in this model by giving support for applying best algorithm to make the successful result. This model has three major parts, outer view-CRM, inner view-ERP and knowledge discovery view. The CRM collect the customerÊs queries, EPR analyze and integrate the data and the knowledge discovery gave predictions and advises for the betterment of an organization. For the practical implementation of presented model, we use MADAR data and implemented Apriori Algorithm on it. Then the new rules and patterns suggested for the organization which helps the organization for solving the problem of customers in future correspondence.Gouri GosawiDinesh Moriya
Copyright (c)
51081210.5138/bjdmn.v5i1.1637Data Visualization and Techniques
http://arjournals.org/index.php/bjdmn/article/view/1638
Data visualization is the graphical representation of information. Bar charts scatter graphs, and maps are examples of simple data visualizations that have been used for decades. Information technology combines the principles of visualization with powerful applications and large data sets to create sophisticated images and animations. A tag cloud, for instance, uses text size to indicate the relative frequency of use of a set of terms. In many cases, the data that feed a tag cloud come from thousands of Web pages, representing perhaps millions ofusers. All of this information is contained in a simple image that you can understand quickly and easily. More complex visualizations sometimes generate animations that demonstrate how data change over time. In an application called Gap minder, bubbles represent the countriesof the world, with each nationÊs population reflected in the size of its bubble. You can set the x and y axes to compare life expectancy with per capita income, for example, and the tool will show how each nationÊs bubble moves on the graph over time. You can see that higher income generallycorrelates with longer life expectancy, but the visualization also clearly shows that China doesnÊt follow this trend·in 1975, the country had one of the lowest per capita incomes but one of the longer life expectancies. The animation also shows the steep drop in life expectancy in many sub-Saharan African countries starting in the early 1990s (corresponding to the AIDS epidemic in that part of the world) and the plummeting of life expectancy in Rwanda at the time of that nationÊs genocide.Akansha SharmaPrem Pratap Singh
Copyright (c)
51131510.5138/bjdmn.v5i1.1638Data Mining Based on Association Rule Privacy Preserving
http://arjournals.org/index.php/bjdmn/article/view/1639
The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Association rulehiding algorithm for privacy preserving data mining is to hide certain information so that they cannot be discovered through association rule mining algorithm. The main approached of association rule hiding algorithms to hide some generated association rules, by increase or decrease the support or the confidence of the rules. The association rule items whether in Left Hand Side (LHS) or Right Hand Side (RHS) of the generated rule, that cannot be deduced through association rule mining algorithms. The concept of Increase Support of Left Hand Side (ISL) algorithm is decrease the confidence of rule by increase the support value of LHS. It doesnÊt work for both side of rule; it works only for modification of LHS. In Decrease Support of Right Hand Side (DSR) algorithm, confidence of the rule decrease by decrease the support value of RHS. It works for the modification of RHS. We proposed a new algorithm solves the problem of them. That can increase and decrease the support of the LHS and RHS item of the rule correspondingly so that more rule hide less number of modification. The efficiency of the proposed algorithm is compared with ISL algorithms and DSR algorithms using real databases, on the basis of number of rules hide, CPU time and the number of modifies entries and got better results.Sulakshana DubeyArun Sen
Copyright (c)
51162110.5138/bjdmn.v5i1.1639A Roadmap: Designing and Construction of Data Warehouse
http://arjournals.org/index.php/bjdmn/article/view/1640
Data warehousing is not about the tools. Rather, it is about creating a strategy to plan, design, and construct a data store capable of answering business questions. Good strategy is a process that is never really finished; A defined data warehouse development process provides a foundation for reliability and reduction of risk. This process is defined through methodology. Reliability is pivotal in reducing the costs of maintenance and support. The data warehouse development enjoys high visibility; many firms have concentrated on reducing these costs. Standardization and reuse of the development artifacts and the deliverables of the process can reduce the time and cost of the data warehouseÊs creation. In todayÊs business world,data warehouses are increasingly being used to help companies make strategic business decisions. To understand how a warehouse can benefit you and what is required to manage a warehouse, you must first understand how a data warehouse is constructed and established.Dinesh MoriyaGouri Gosawi
Copyright (c)
51222510.5138/bjdmn.v5i1.1640FP-Growth Tree Based Algorithms Analysis: CP-Tree and K Map
http://arjournals.org/index.php/bjdmn/article/view/1641
We propose a novel frequent-pattern tree (FP-tree) structure; our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods. FP-tree method is efficient algorithm in association mining to mine frequent patterns in data mining, in spite of long or short frequent data patterns. By using compact best tree structure and partitioning-based and divide-and-conquer data mining searching method, it can be reduces the costs searchsubstantially .it just as the analysis multi-CPU or reduce computer memory to solve problem. But this approach can be apparently decrease the costs for exchanging and combining control information and the algorithm complexity is also greatly decreased, solve this problem efficiently. Even if main adopting multi-CPU technique, raising the requirement is basically hardware, best performanceimprovement is still to be limited. Is there any other way that most one may it can reduce these costs in FP-tree construction, performance best improvement is still limited.Neelesh ShrivastavaRicha Khanna
Copyright (c)
51262910.5138/bjdmn.v5i1.1641Analysis And ImplementationOf K-Mean And K-Medoids Algorithm For Large Dataset To Increase Scalability And Efficiency
http://arjournals.org/index.php/bjdmn/article/view/1642
The experiments are pursued on both synthetic in data sets are real. The synthetic data sets which we used for our experiments were generated using the procedure. We refer to readers to it for more details to the generation of large data sets. We report experimental results on two synthetic more data sets in this data set; the average transaction of size and its average maximal potentially frequent item set its size are set, while the number of process in the large dataset is set. It is a sparse of dataset. The frequent item sets are short and also numerous data sets to cluster. The second synthetic data set we used is. The average transaction size and average maximal potentially frequent item set size of set to 30 and 32 respectively. There exist exponentially numerous frequent item data sets in this data set when the support based on threshold goes down. There are also pretty long frequent item sets as well as a large number of short frequent item sets in it. It process of contains abundant mixtures of short and long frequent data item sets.Anjani PandeyMahima Shukla
Copyright (c)
51303210.5138/bjdmn.v5i1.1642An Association of Efficient Mining by Compressed Database
http://arjournals.org/index.php/bjdmn/article/view/1643
Data mining can be viewed as a result of the natural evolution of information technology. The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for data compression, that is, the ability to reduce the amount of storage or Internet bandwidth required to handle the data. This paper analysis the various data mining approaches which is used to compress the original database into a smaller one and perform the data mining process for compressed transaction such as M2TQT,PINCER-SEARCH algorithm, APRIORI & ID3 algorithm, TM algorithm, AIS & SETM, CT-Apriori algorithm, CBMine, CTITL algorithm, FIUT- Tree. Among the various techniques M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate item sets which are impossible to become frequent in order to improve the performance of mining association rules. Thus M2TQT is observed to perform better than existing approaches.Anjani PandeyGayatri Singh
Copyright (c)
51333510.5138/bjdmn.v5i1.1643A Hybrid Based RecommendationSystem based on Clustering and Association
http://arjournals.org/index.php/bjdmn/article/view/1644
Recommendation systems play an important role in filtering and customizing the desired information. Recommender system are divided into 3 categories i.e collaborative filtering , contentbased filtering, and hybrid filtering are the most adopted techniques being utilized in recommender systems. The main aim of this paper is to recommend the best suitable items to the user. In this paper the approach is to cluster the data and applying the association mining over clustering. The paper describes about different hybridization methods and discuss various limitations of current recommendation methods such as cold-start problem ,Graysheep problem,how to find the similarity between users and items and discuss possible extensions that can improve recommendation capabilities in range of applications extensions such as , improvement of understanding of users and items incorporation ofthe contextual information into the recommendation process, support for multicriteria ratings.Jaimeel M. ShahLokesh Sahu
Copyright (c)
51364010.5138/bjdmn.v5i1.1644