Tuesday, June 30, 2009

"Data mining" through discovery


Before I start off, I’ve read this article regarding “An introduction about data mining” for me to have at least idea, background or in some way get myself familiarized with a certain topic. As to go further, I have been thinking how data mining works and how important or valuable it is for everyone in different fields. Imagine how simple words could just turn out into a broad range of facts.

Of course, the topic itself should include an extensive range of information with the purpose of coming up with an idea that would definitely provide readers a comprehensive knowledge about a certain matter. For it to take place or for it to be implemented, I, myself, should arrive with a process. Not just the common ones, a long process rather, that will formulate my ideas and make it visible to others. This simple thought or example can be compared to data mining which incorporates the extraction of hidden predictive information from large databaseswell defined. In a nut shell, data mining has something in it that enables industry to foresee future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. Still complicated? We’ll go through it...

Basically, Data and information are correlated. Looking through the definition of data is that it focuses on raw facts and describes a certain phenomenon while information is processed data wherein there is a particular meaning with in a specific context. After getting the significance and advantages of Information Technology in a competitive manner, it is also essential to know the value of a certain data that are not likely visible or is hidden. Apparently, these were all taught in class weeks ago.

These days and the days onward, digital information is relatively easy to capture and quite inexpensive to store. Agree? Well, the digital revolution has seen collections of data grow in size, and the complexity of the data therein increase. Advances in technology have resulted in our ability to meaningfully analyse and understand the data we gather lagging far behind our ability to capture and store these data. A question commonly arising as a result of this state of affairs is having gathered such quantities of data, what do we actually do with it?

Such information may often be usefully analysed using a set of techniques referred to as knowledge discovery or data mining. These techniques essentially seek to build a better understanding of data, and in building characterisations of data that can be used as a basis for further analysis extract value from volume. This is indeed interesting yet hard to absorb.

As I go along, it is commonly accepted that the basis for capturing and storing large amounts of data is due to the belief that there is valuable information implicitly coded within it. An important issue is therefore how is this hidden information (if it exists at all) be revealed? Traditional methods of knowledge generation rely largely upon manual analysis and interpretation. However, as data collections continue to grow in size and complexity, there is a corresponding growing need for more sophisticated techniques of analysis.

For this case, software is needed. There are actually techniques which I have read in one of the given articles.

  • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.
  • Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) .
  • Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
  • Nearest neighbour method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbour technique.
  • Rule induction: The extraction of useful if-then rules from data based on statistical significance.

These techniques are really helpful in small volumes of data which I think includes software. However, there is more to that I guess. MODELING is a word which is more likely familiar to some of you. This is a technique used in order to perform such feats in data mining. Like, it is simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don't.

Consider, for example, customers of a bank who only use the institution for a checking account. An analysis reveals that after depositing large annual income bonuses, some customers wait for their funds to clear before moving the money quickly into their stock-brokerage or mutual fund accounts outside the bank. This represents a loss of business for the bank, of course.

To persuade these customers to keep their money in the bank, marketing managers can use data mining software to immediately identify large deposits and trigger a response. The system might automatically schedule a direct mail or telemarketing promotion as soon as a customer’s balance exceeds a predetermined amount. Based on the size of the deposit, the triggered promotion can then provide an appropriate incentive that encourages customers to invest their money in the bank’s other products.

Finally, by tracking responses and following rules for attributing customer behavior, certain software can help measure the profitability and ROI of all ongoing campaigns.

No comments:

Post a Comment