Network Analysis

Constraint-based substructure mining

In the concept of network analysis, the relationship between the units is called links in a graph. From the data mining outlook, this is called link mining or link analysis. The network is a diversional dataset with a multi-relational concept in form of a graph. The graph is very large with nodes as objects, edges as links which in turn denote the relationship between the nodes or objects. Telephone networking systems, WWW( World Wide Web) are very good examples. It also helps in filtering the datasets and providing customer-preferred services. Every network consists of numerous nodes. The datasets are widely enormous. Thus by studying and mining useful information from a wide group of datasets would help in solving problems and effective transmission of data.

Link Mining

There are some conventional methods of machine learning in which taking homogeneous objects from one relationship is taken. But in networks, this is not applicable due to a large number of nodes and its multi-relational, heterogeneous nature. Thus the link mining has appeared as a new field after many types of research. Link mining is the convergence of multiple research held in graph mining, networks, hypertexts, logic programming, link analysis, predictive analysis, and modeling. Links are nothing but the relationship between nodes in a network. With the help of links, the mining process can be held efficiently. This calls for the various functions to be done.

Link-based object classification: In link mining, only attributes are not enough. Here the links and the traits of the linked nodes are also necessary. One best example is Web-based classification. In web-based classification, the system predicts the categorization of a webpage based on the presence of that specified word which means the searched word occurs on that page. Anchor text is which the person clicks the hyperlink that opens while searching. These two things act as attributes in web-based classification. The attributes can be anything that relates to the link and network pages.
Link type prediction: According to the resources of the object involved, the system predicts the motive of that link. In organizations, it helps in suggesting interactive communication sessions between employees if needed. In the online retail market, it helps predict what a customer prefers to buy which can increase sales and recommendations.
Object type prediction: Here the prediction is based on the type of the object involved, its attributes and properties, links and traits of the object linked to it. For example in the restaurant domain, a similar method is done to predict if a customer prefers ordering food or directly visiting the restaurant. It also helps in predicting the method of communication a customer prefers whether by phone or mail.
Link Cardinality estimation: In this task, there are two types of estimation. The first one is predicting the number of links linked to an object. For example, the percentage of the authority of a web page can be calculated by finding the number of links linked to it which is called in-links. Web pages that act as a hub which means a set of web pages denotes other links which come under the same topic can be identified using out-links. For example, when a pandemic strikes, finding the links of the affected patient can lead us to the other patients which helps in the control of the transmission. The second one is done by predicting the number of objects outreaching along a route from an object. This method is crucial in estimating the object number returned as output by a query.
Predicting link existence: In link type prediction, the type of the link is predicted. But, here the system predicts whether a link exists between two objects. For an instance, this task is used to predict if a link exists between two web pages.
Object Reconciliation: In this method, the function is to predict if any two objects are the same on the basis of their attributes or traits or links. This method is also called identity uncertainty or record linkage. This task has it’s the same procedure in the matching of citation, extraction of details, getting rid of duplicates, consolidating objects. For an instance, this task is to help if one website is reflecting the other website like a mirror to each other.

Challenges in Link Mining

Statistical compared to logical dependencies: The logical relationship between objects is denoted by graph-link structures. The statistical relationship is denoted by probabilistic dependencies. The rational handling of these two dependencies is difficult in data mining which is multi-relational. One must be careful enough to find the logical dependencies between objects along with probabilistic relationships between attributes. These dependencies take a large amount of space which complicates the mathematical model deployed.
Collective classification and consolidation: Let us consider a training model based on objects that are class-labeled. In conventional classification, classification is only done based on the attribute. If there is a chance that classification occurs after giving training with unlabeled objects, the model becomes incapable of classification due to the complications of the correlations of the objects. This calls for the need for another supplementary iterative step which consolidates the labels of objects based on the labels of objects linked to it. Here collective classification takes place.
Constructive use of labeled and unlabeled data: One emerging technique is to merge both labeled and unlabeled data. Unlabeled data assist in identifying the distribution of attributes. The links that are present in unlabeled data help us in extracting the linked object’s attributes. The links that are present between unlabeled and labeled data help in establishing dependencies which increases the efficiency in interference.
Open compared to closed-world assumptions: In the conventional method, it is assumed that we know all the possible objects/ entities present in the domain which is closed-world assumptions. But, closed world assumption is impractical in the application of reality. This calls for the introduction of specific language for probability distributions with respect to relational objects that contains a varied set of objects.

Community Mining

Network analysis includes the finding of objects which are in groups that share similar attributes. This process is known as community mining. In the web page linkage, the introduction of community where a group of web pages is made which follow a common theme. Many community mining algorithms decide that there is only one network and it tries to establish a homogeneous relationship. But in the real world web pages, there are multiple networks with heterogeneous relationships. This proves the need for multi-relational community mining.

Data Mining Graphs and Networks

Data mining is the process of collecting and processing data from a heap of unprocessed data. When the patterns are established, various relationships between the datasets can be identified and they can be presented in a summarized format which helps in statistical analysis in various industries. Among the other data structures, the graph is widely used in modeling advanced structures and patterns. In data mining, the graph is used to find subgraph patterns for discrimination, classification, clustering of data, etc. The graph is used in network analysis. By linking the various nodes, graphs form network-like communications, web and computer networks, social networks, etc. In multi-relational data mining, graphs or networks is used because of the varied interconnected relationship between the datasets in a relational database.

Network Analysis

Link Mining

Challenges in Link Mining

Community Mining

Data Mining Graphs and Networks

Similar Reads

Contact Us