Customer behavior in CRM system based on Data Mini

  • Detail

Customer behavior in CRM system based on data mining technology


with the rapid development of information technology and networking economy, the business model has undergone fundamental changes. The products and services provided by many industries are increasingly commercialized, the homogenization tendency of enterprises' products is stronger and stronger, and the market competition is more and more fierce. At the same time, customers have higher and higher requirements for the quality, personalization and value of products and services. In this environment, establishing and maintaining a good customer relationship has become the most important basis for enterprises to obtain competitive advantage. This requires enterprises to fully grasp customer information, accurately grasp customer requirements, quickly respond to personalized needs, provide convenient purchase channels and good services, and improve customer satisfaction and loyalty. Under this premise, customer relationship management (CRM) came into being. This is also the inevitable result of the transformation of the enterprise management concept from the traditional customer relationship centered on products and sales to the customer relationship centered on customer satisfaction. The establishment of a customer-centered management system is a strategic decision related to the survival and development of enterprises

1 overall framework of customer relationship management

crm is to collect, process and process a large amount of information related to customer behavior, determine the interests, consumption habits, consumption tendencies and consumption needs of specific consumer groups or individuals, and then infer the next consumption behavior of corresponding consumer groups or individuals, and then based on this, carry out targeted marketing of specific content for the identified consumer groups, which improves the marketing effect, Bring more profits to the enterprise

on the whole, customer relationship management architecture includes three levels of applications, namely customer access, business process management and decision support. Among them, customer access is to use e-commerce, call center, etc. to interact with customers and respond quickly; Business process management is to realize the whole process quantitative management and work automation of marketing, sales, service and other departments; Decision support uses data warehouse and data mining technology to provide support

2 data mining

2.1 data mining concept

data mining is the process of extracting hidden, unknown but potentially useful information and knowledge from a large number of incomplete and random data. The process of data mining generally consists of three stages: data preparation, data mining and interpretation and evaluation. Data mining is a step in the whole process of knowledge discovery, which is composed of some specific data mining algorithms. Its function and purpose is to generate the calculation or enumeration of the pattern expression (E) of a fact (f) under the constraints of acceptable computational efficiency. Data mining is mainly realized by classification, regression, clustering and association. The data mining technology used in this paper is classification mining

the intuitive meaning of classification is to form a model based on the training set. The class labels in the training set are known. Use this model to predict the new data and determine which class label the given data belongs to. The purpose of classification is to find a model to predict the value of the objective function. The possible forms of the results of the model include algebraic expressions, decision trees, neural networks, a complex algorithm, or the combination of the above methods

2.2 data mining process in customer relationship management system

with the rapid development of database technology and the wide application of database management system, people have accumulated more and more data. However, there are few systems with deeper data processing in CRM system. For example, the current database system can only complete the functions of data entry, query, statistics and so on, but it can not find the relationships and rules in the data, and can not predict the future development trend according to the existing data. Therefore, enterprise decision makers hope that the CRM system can provide effective management of a large number of customer information, and at the same time, it can carry out a higher-level analysis of it, further discover the potentially useful information and knowledge hidden in the customer information table, maximize the reflection of a certain regularity of enterprise customers, and find out the valuable information of a series of experiments such as stretching, tightening and tearing of auto parts and other items, so as to guide business behavior, This process is data mining in c-hop system. In the RM system where the production and sales growth of C new energy vehicles exceeds 50%, the process of data mining can be shown in Figure 1

Figure 1 data mining process in CRM system

3 data selection and preprocessing

the data in this paper is from the sales records of Huitong computer sales Industry Corporation in recent years. Tens of thousands of pieces of data information are stored in the company's data warehouse. However, what users are interested in is often only a subset of the data warehouse. Therefore, it is unrealistic to mine the whole data warehouse indiscriminately. In addition, real-world data are generally noisy, incomplete and inconsistent. Using data preprocessing can improve the quality of data, which helps to improve the accuracy and performance of the mining process. In relational databases, selecting relevant data sets and data preprocessing not only make mining more effective, but also produce more meaningful rules

3.1 attribute correlation analysis

when mining data in the data warehouse, most of the attributes are not related to the mining task and are redundant. It is harmful to omit relevant attributes or leave irrelevant attributes. Irrelevant or redundant attributes increase the amount of data, which may slow down the mining process and reduce the system performance. However, it is not a simple thing for users to determine which dimensions and attributes should be included in class feature analysis, so corresponding methods should be introduced for attribute correlation analysis to filter out statistically uncorrelated or weakly correlated attributes

3.2 attribute correlation analysis method

in order to ensure a certain correlation between input and output, information gain can be used to investigate the correlation between attributes. In 1948, Shannon proposed information theory and defined information and entropy

entropy is actually the weighted average of system information, that is, the average information of the system. The principle of information gain index is derived from information theory

let the training set pointing to n be s, which contains M different classes, and they distinguish different classes CI (for I = 1,..., m). Let Si be the number of records belonging to class CI in S. Then before splitting, the total entropy of the system:

i (S1, S2,..., SM) = - Σ (I = 1 to m) PI log2 (PI)

it is easy to see that the total entropy is the weighted average of the amount of information recorded belonging to each class

let attribute a be an attribute {a1, A2,..., AV) with V different values, and a can divide s into v subsets {s1, S2,..., sv}, Where SJ = {x | x ∈ S & XA = AJ). If a is selected as the test attribute, these subsets represent all branches starting from the representative set S. let SIJ represent the number of records with class CI in SJ. At this time, split according to each attribute value of a (more generally, take a subset of a), and the total entropy of the split system is:

e (a) = Σ (J = 1 to V) ((S1J + S2J +... + SMJ)/s) *i (S1J + S2J +... + SMJ)

total entropy e (a) is the weighted average of the information of each subset. The information gain after classifying n with attribute a is:

gain (a) = I (S1, S2,..., SM) - E (a)

in the correlation analysis method, the information gain of each attribute defining the sample in s can be calculated, and the attribute correlation threshold used to identify weak correlation is set to A0. If the attribute information gain is less than this threshold, it is considered to be weak correlation and should be deleted

3.3 attribute oriented induction

in the data warehouse of CRM system, a description information table about the goods purchased by customers is recorded, which includes the products purchased by customers, time, place, age, customer income, etc. The purchase behavior of each customer can be described according to the concept tree and summarized with the knowledge of the basic concept tree. The basic concept tree is actually a tuple merging process, that is, data preprocessing. Its basic idea is: (1) the more specific value of an attribute is replaced by the parent node in the concept tree of the attribute (this process is also called attribute generalization); (2) Merge the same tuples to form a more macroscopic tuple, and calculate the number of tuples covered by the macrogroup; If the number of macro groups in the database is still large, if such terminals are used for aluminum alloy cable connection, then use the more general parent node in the concept tree of this attribute to replace, and finally generate macro groups with wide coverage and small number

after using the decision tree to define the concept tree, the data of all the concept definitions in the database can be collected into a data set. At this time, the principle of tuple merging is used to generalize the data condition attribute values of the data set according to its concept tree, and the macro groups are merged until the number of macro groups meets the requirements

4 uptree classification mining

at present, the technologies used in classification mining include: decision tree classification, Bayesian classification, neural network classification, etc. among them, decision tree classification is the most widely used classification method, which has the characteristics of relatively fast learning speed and can be converted into easily understood classification rules. Based on SLIQ classification algorithm, this paper improves and optimizes it, which is called uptree classification algorithm

4.1 design of uptree algorithm

uptree algorithm adopts the method of pre sorting and breadth first to construct the decision tree, and prunes synchronously when the decision tree is generated. Pre sorting reduces the time consumed in sorting numeric fields, and breadth first enables all leaf nodes in the current tree to be divided in the same traversal

the data structure of uptree uses attribute tables of several resident disks and class tables of a single resident main memory. Each attribute has an attribute table, which is indexed by rid (record identifier). Each tuple is represented by a link from an entry of each attribute table to an entry of the class table (the class label storing the given tuple), and the class table entry is linked to its corresponding leaf node in the judgment tree, as shown in Figure 2

Figure 2 attribute and class table used by uptree

4.2 uptree's segmentation index cover

is different from general decision trees. Uptree adopts Gini index for attribute selection, which can be applied to category fields and numerical fields. For each node, we need to calculate the best splitting scheme first, and then perform splitting

if the set t is divided into two parts N1 and N2, the Gini of the cut is:

providing the minimum Gini is selected as the splitting standard (for each attribute, all possible segmentation methods should be traversed)

for the form of numerical attribute splitting, a ≤ v. Therefore, you can sort numeric fields first, assuming that the sorted results are V1, V2,..., VN. Because splitting only occurs between two nodes, there are n-1 possibilities. Usually, the midpoint (VI + VI + 1)/2 is taken as the splitting point. Take different split points from small to large, and the one with the largest information gain index (Gini minimum) is the split point

for discrete fields (categorical attributes), let s (a) be all possible values of a, and the split test will take all subsets of s'. Find the Gini index when splitting into s' and S-S', and the best splitting method is when Gini is the smallest

4.3 uptree algorithm flow

the control structure of the algorithm is a queue. This queue stores the current leaf nodes that need to be split

Copyright © 2011 JIN SHI