More about data science: cienciadedatos.net
- Decision Trees with Python: Regression and Classification
- Gradient Boosting with Python
- Machine learning with Python and Scikit-learn
- Grid search of Random Forest models with out-of-bag error and early stopping
Introduction
One of the latest trends in the field of artificial intelligence is Graph Machine Learning (GML). This discipline focuses on applying machine learning and statistical algorithms to the study of complex networks or graphs.
A graph is a data structure formed by nodes and links that allows representing relationships between objects. For example, in a social network, the nodes could represent users and the links the connection between them.
GML uses graphs to train machine learning models and to be able to make inferences about the nodes or links. The three main types of problems that can be solved with GML are:
Inference on nodes: predicting characteristics of nodes using the characteristics of their neighboring nodes.
Inference on links: predicting characteristics or existence of links, for example, inferring which links are most likely to occur in the future. This is a typical case of recommendation systems.
Inference on sets of nodes: identifying communities of nodes that present similar behaviors. This is a typical case in customer segmentation with similar characteristics.
In this series of articles, the aim is to show the basics of Graph Machine Learning and solve practical cases using the most important graph libraries in Python such as NetworkX PyTorch Geometric, Stellargraph or Graph Neural Networks.

Types of graphs
Graphs are classified based on their nodes and links. Several types of graphs can be differentiated depending on the type of relationship that exists between their nodes.
Undirected graphs
The links have no direction, that is, if a node is connected to a node , then is also connected to . An example of an undirected graph is a social network where users are connected to each other through friendship ties.

Directed graphs
The links have a direction, that is, if a node is connected to a node , is not necessarily connected to . An example of a directed graph is a telephone call network where the nodes are people and the edges show each call, taking into account who calls whom.

**Weighted graphs** The links have an associated weight, which represents the importance or intensity of the relationship between the nodes. An example of a weighted graph is a network of cities where the links represent transport routes and the weight represents the distance between cities.

Bipartite graphs
The nodes are divided into two disjoint sets (two types of nodes), and only links between nodes of different sets are allowed. An example of a bipartite graph is a network of movies and actors, where the nodes of one set are the movies and the nodes of the other set are the actors; and only links between movies and actors are allowed.

Adjacency Matrix
To be able to analyze a graph, it is necessary to represent it mathematically. There are two main ways to do this: the edgelist and the adjacency matrix.
The edgelist is simply a list in which all the connections are indicated. For example, in a directed graph with three nodes {, and }, where connects to and , the edgelist is {(,), (,)}. If the graph is undirected, the list must include the connections in both directions {(,), (,) (,), (,)}.
Another way to represent a graph is by means of what is known as an adjacency matrix. The adjacency matrix is a matrix of dimension , where is the number of nodes and where a 1 appears if the connection between a pair of nodes exists and a 0 otherwise. For weighted graphs, instead of a 1, the matrix presents the value of the weight of the connection. Due to its mathematical properties, the adjacency matrix is the most used method of representing graphs.
The disadvantage of the adjacency matrix is that for very large graphs, it can take up a lot of space. For this reason, a sparse adjacency matrix is frequently used, which internally only stores the information of the existing connections, that is, it does not store the zeros.
For undirected graphs, the adjacency matrix is symmetric since, if the connection between nodes and exists, the reciprocal connection will also exist.

With a mathematical formulation, the adjacency matrix of a directed graph of nodes has rows and columns, its elements being:
if there is a link pointing from node to node
if nodes and are not connected to each other
The adjacency matrix of an undirected network has two entries for each link. The link is represented as and . Therefore, the adjacency matrix of an undirected network is symmetric,
In the following figure, the different types of graphs are represented along with their adjacency matrices. As can be seen, directed graphs () are the only ones that do not have a symmetric adjacency matrix. In the case of bipartite graphs (), they can be projected to obtain the indirect connections between each type of node (), the latter will be seen in depth later.

Example of graphs of different types with their corresponding adjacency matrices. A) Undirected graph. B) Directed graph. C) Weighted graph. D) Bipartite graph, with two types of nodes (green and orange). E) Projection of the bipartite graph on the nodes of each type. Source: wikipedia.
Meta-information of nodes and edges
When working with graphs, especially if you want to make predictions or other types of analytics, it is very common to have additional information about the nodes and links. This information is called meta-information because it does not strictly belong to the graph and is usually stored as a table, whose rows are the nodes or links and its columns are the available variables for each one. If the social network example is used, the meta-information of the nodes could be the name, age, hobbies, etc...
Although less common than for nodes, links can also have meta-information. Continuing with the social network example, the meta-information of the edges could be the year in which the friendship relationship between two people began.
Bipartite Graphs
A bipartite graph is a network whose nodes can be divided into two disjoint sets and such that the links connect nodes of with nodes of . In other words, if the nodes are colored green and the nodes are colored purple, there are only links connecting nodes of different colors.

It is possible to generate two projections for each bipartite network. The first projection connects nodes if they are connected to the same node in the bipartite representation. The second projection connects nodes if they are connected to the same node in the bipartite representation.
Returning to the previous example of a bipartite network, in which one set of nodes corresponds to movies () and the other to actors (), and each movie is connected to the actors who act in it; the first projection of this network connects the actor nodes if they have acted in the same movie. The other projection connects the movies that share at least one actor in their cast.
Paths and distances
Physical distance plays a key role in the interactions between the components of systems. For example, the physical distance between two cities influences the number of visitors traveling from one city to another.
In networks, distance is a different concept from physical distance. If the question is asked: What is the distance between two users of a social network? Physical distance ceases to be relevant because two individuals living in the same building may not know each other and have very good friends in other parts of the world.
In networks, physical distance is replaced by path length. A path between two nodes is a route that starts at the first node and travels along the links of the network, passing through different nodes until it reaches the second node. The path length represents the number of links that the path between two nodes contains.
In network science, paths play a central role. Below are some of their most important properties:
Reciprocity
In an undirected network, the distance between node and node is the same as the distance between node and node ( = ). On the contrary, in a directed network, often . Furthermore, in a directed network, the existence of a path from node to node does not guarantee the existence of a path from to .
Shortest path between two nodes
In networks, determining the distance between two nodes is equivalent to identifying the shortest path between said nodes. The distance (or shortest path) between two nodes and can be calculated directly from the adjacency matrix . It is enough to multiply the adjacency matrix by itself as many times as the distance between two nodes.
: If there is a direct link between and , then ( otherwise
: If there is a path of length 2 between and , then otherwise). The number of paths between and is
The number of paths of length between and is
These equations are valid for both directed and undirected networks.
In other words, to obtain the distance of order 2 between any pair of nodes, it is enough to multiply the adjacency matrix by itself. If we multiply it by itself n times, we obtain the distance matrix of order n.
Thus, the distance matrix of order 2 will show us all the pairs of nodes that are connected with an intermediate node. In the following example, nodes 2 and 5 have nodes 4 and 1 in between, that is, there are two possible paths of order 2 that connect them: 2-4-5 and 2-1-5. For this reason, if the adjacency matrix is multiplied by itself, it is obtained that the element [2,5] of the distance matrix is 2. This means that there are two paths of distance 2 between nodes 2 and 5.

Properties of networks and nodes
The following are some important properties of networks and nodes in graph theory. In successive articles they will be explained in more detail, including their formulation and mathematical implications.
- Degree of a node: the number of links a node has. Directed graphs have two types of degree: "in-degree" for incoming connections and "out-degree" for outgoing connections.
- Neighborhood of a node: the nodes to which a node is directly connected.
- Average degree: the average of the degree of all nodes in a network.
- Density: the number of links in a network divided by the maximum number of possible links. Density provides an idea of how populated with links the network is.
- Connected components: groups of nodes connected to each other. Not all connections have to exist, but from any node you can travel to the rest of the nodes.
- Paths and cycles: a path is a sequence of links that connect two nodes, while a cycle is a path that begins and ends at the same node.
- Centrality: a measure of the importance of a node in a network, according to its degree, neighborhood, paths and cycles, among other factors.
- Clustering: a measure of the number of existing links between the neighbors of a node.
- Clique: a set of nodes all connected to each other.
- Subgraph: the resulting network when selecting only a series of nodes.
Node Centrality
Centrality is a concept used to measure the importance of nodes in a graph. The three most important measures of centrality are:
- Degree centrality: it is the simplest measure, it is based on the number of links a node has, the sum of links that enter and leave.
- Betweenness centrality: measures the number of paths that pass through a node.
- Closeness centrality: measures the average distance from a node to all other nodes in the graph.
To learn more about the properties of graphs, you can visit the next chapter of the XX series.
Graphs with NetworkX
NetworkX is one of the most used Python libraries for working with graphs and networks. NetworkX allows you to create, manipulate and analyze graphs efficiently.
One of the main advantages of NetworkX is its ability to work with graphs of great size and complexity, allowing you to handle graphs with millions of nodes and links. The library has a wide variety of functionalities that allow you to create, import and export graphs in multiple formats, as well as analyze the properties of these networks (average degree, density, clustering coefficient, the shortest path between two nodes, and many others). In addition, it has a series of algorithms to search for patterns, such as community detection, centrality detection and connected components detection. All these properties will be seen in successive articles.
NetworkX is also compatible with other Python libraries, such as NumPy, SciPy, Matplotlib or Pytorch, which allows you to easily integrate network analysis into a broader data analysis workflow.
Important Functions in NetworkX
Below is a list of some of the most used functions in NetworkX.
- Graph creation: NetworkX allows you to create empty graphs or graphs with specific nodes and links using functions like
Graph()(undirected graphs),DiGraph()(directed graphs), or even for sets of graphs withMultiGraph()andMultiDiGraph().
- Adding nodes and links: You can add nodes and links to an existing graph manually, either one by one with
add_node()andadd_edge(), or from a list withadd_nodes_from()andadd_edges_from(). They can also be added from a file or DataFrame withfrom_pandas_edgelist(). For weighted graphs, theadd_weighted_edges_fromfunction is used.
- Graph information: You can get basic information about the graph, such as the number of nodes and links, using the
number_of_nodes()andnumber_of_edges()functions. Usingnodes()andedges()you can access the meta-information of nodes and edges.
- Neighbors and degree: You can identify the neighbors and the degree of a node using the functions:
neighbors()anddegree(); for directed networks,in_degree()andout_degree()are used.
- Centrality: You can calculate centrality measures, such as degree centrality, betweenness centrality, and closeness centrality among nodes using the
degree_centrality(),betweenness_centrality()andcloseness_centrality()functions.
- Connected components: You can find the connected components of a graph using the
connected_components()function.
- Reading and writing files: NetworkX allows you to read and write graphs in various file formats using
read_edgelist(),write_edgelist(),read_adjlist()andwrite_adjlist().
- Optimization algorithms: NetworkX has integrated optimization algorithms such as the shortest path and the maximum flow. They are accessed with the
shortest_path()anddijkstra_path()functions.
- Subgraphs: You can generate subgraphs of a graph using the
subgraph()function, giving a list of nodes as input.
- Set operations: NetworkX allows you to perform set operations on graphs such as union, intersection and difference using
union(),intersection()anddifference().
- Adjacency matrix: With the
nx.adjacency_matrix(G)function you get the adjacency matrix of a graph. By default, it is in sparse format, to see it on the screen you have to use theto_dense()function from numpy.
- Graph plot: The
draw()function allows you to draw graphs using the Matplotlib library.
Creating Undirected Graphs
NetworkX allows you to create networks manually, adding nodes and edges one by one or from a file or a DataFrame containing the connections. The latter is especially useful when working with network data that has already been collected and is in a structured format.
Manual Creation
# Libraries
# ======================================================================================
import networkx as nx
import pandas as pd
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
To create a graph in NetworkX, a "Graph" object must be created using the nx.Graph() function. This function creates an empty graph, without nodes or edges, to which elements can be added later.
# Creating a "Graph" type instance
# ======================================================================================
G = nx.Graph()
print(G)
Graph with 0 nodes and 0 edges
It is verified that it is an undirected graph.
G.is_directed()
False
Once the Graph object has been created, it can be populated with nodes and connections. Two methods are used for this:
add_node: adds a single node to the graph.
add_nodes_from: adds multiple nodes to the graph.
add_edge: adds an edge between nodes u and v. If the nodes do not exist, they are created and added to the graph automatically.
add_edges_from: same behavior asadd_edgebut using a collection of edges. Each edge is defined with a tuple (u, v).
The name of the nodes can be either numeric or characters.
# Add a single node
# ======================================================================================
fig, ax = plt.subplots(figsize=(1,1))
G.add_node("A")
nx.draw(G, with_labels=True, ax=ax)
print(G)
Graph with 1 nodes and 0 edges
# Add multiple nodes
# ======================================================================================
G.add_nodes_from(["B", "C"])
fig, ax = plt.subplots(figsize=(3, 2))
nx.draw(G, with_labels=True, ax=ax)
ax.set_xlim([1.2*x for x in ax.get_xlim()])
ax.set_ylim([1.2*y for y in ax.get_ylim()])
print(G)
Graph with 3 nodes and 0 edges
# Add a single edge
# ======================================================================================
G.add_edge("A", "B")
fig, ax = plt.subplots(figsize=(3, 2))
nx.draw(G, with_labels=True, ax=ax)
ax.set_xlim([1.2*x for x in ax.get_xlim()])
ax.set_ylim([1.2*y for y in ax.get_ylim()])
print(G)
Graph with 3 nodes and 1 edges
# Add multiple edges
# ======================================================================================
G.add_edges_from([("A", "C"), ("B", "C")])
fig, ax = plt.subplots(figsize=(3, 2))
nx.draw(G, with_labels=True, ax=ax)
ax.set_xlim([1.2*x for x in ax.get_xlim()])
ax.set_ylim([1.2*y for y in ax.get_ylim()])
print(G)
Graph with 3 nodes and 3 edges
If a connection is added whose nodes do not exist, they are created automatically.
G.add_edges_from([("D", "E"), ("E", "F")])
fig, ax = plt.subplots(figsize=(4, 4))
nx.draw(G, with_labels=True, ax=ax)
ax.set_xlim([1.2*x for x in ax.get_xlim()])
ax.set_ylim([1.2*y for y in ax.get_ylim()])
print(G)
Graph with 6 nodes and 5 edges
The corresponding adjacency matrix is:
adjM = nx.adjacency_matrix(G)
# The matrix is converted from sparse to dense format to be able to print it
adjM = adjM.todense()
adjM
array([[0, 1, 1, 0, 0, 0],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 0]])
And the number of nodes and edges:
print(G.number_of_edges())
print(G.number_of_nodes())
5 6
The information of the nodes and edges of the graph is stored in the nodes and edges attributes.
print(f"Graph nodes: {G.nodes}")
print(f"Graph edges: {G.edges}")
Graph nodes: ['A', 'B', 'C', 'D', 'E', 'F']
Graph edges: [('A', 'B'), ('A', 'C'), ('B', 'C'), ('D', 'E'), ('E', 'F')]
Creation from a DataFrame
To create a graph from a pandas dataframe, the information has to be structured in such a way that one column represents the start of each edge and another the destination. For example, to represent that there are two nodes ("A" and "B") connected to each other, a row is needed that contains the value "A" in one column and "B" in another. This information is sufficient for the two nodes and the connection between them to be created.
# Dataframe with the graph connections
# ======================================================================================
connections = pd.DataFrame(
{
"source": ["A", "B", "C"],
"target": ["C", "C", "D"],
}
)
connections
| source | target | |
|---|---|---|
| 0 | A | C |
| 1 | B | C |
| 2 | C | D |
In the from_pandas_edgelist function, the source and target columns are indicated (for undirected graphs, a source is chosen indistinctly).
# Create a graph from a Dataframe
# ======================================================================================
fig, ax = plt.subplots(figsize=(3,2))
G = nx.from_pandas_edgelist(connections, source="source", target="target")
nx.draw(G, with_labels=True, ax=ax)
Creating directed graphs
Manual creation
As explained previously, the links of directed graphs have a defined direction (the links of these graphs are represented with an arrow). The process of creating a directed graph is equivalent to that of an undirected graph but using DiGraph instead of Graph.
# Creation of a directed graph".
# ======================================================================================
G = nx.DiGraph()
# Connections
G.add_edges_from([(1, 2), (2, 3), (1, 4), (1, 5), (4, 2), (5, 4)])
# Graphic representation
fig, ax = plt.subplots(figsize=(3,2))
nx.draw(G, with_labels=True, ax=ax)
It is verified that it is a directed graph.
G.is_directed()
True
Creation from a DataFrame
The process of creating a directed graph from a DataFrame is equivalent to that of undirected graphs shown in the previous section, with the only difference that the argument create_using = nx.DiGraph must be indicated.
# Create a directed graph from a Dataframe
# ======================================================================================
G = nx.from_pandas_edgelist(
connections,
source = "source",
target = "target",
create_using = nx.DiGraph
)
fig, ax = plt.subplots(figsize=(3,2))
nx.draw(G, with_labels=True, ax=ax)
Since it is a directed graph, its adjacency matrix is not symmetric.
adjM = nx.adjacency_matrix(G)
adjM = adjM.todense()
adjM
array([[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0],
[0, 0, 0, 0]])
Distance between two nodes
To obtain the distance of order 2 between any pair of nodes, it is enough to multiply the adjacency matrix by itself. If we multiply it by itself n times, we obtain the distance matrix of order n.
# Graph creation
G = nx.Graph()
# Add nodes to the graph
G.add_edges_from([(1, 2), (2, 3), (1, 4), (1, 5), (4, 2), (5, 4)])
# Graphic representation
fig, ax = plt.subplots(figsize=(3,2))
nx.draw(G, with_labels=True, ax=ax)
# Adjacency matrix
adjM = nx.adjacency_matrix(G).todense()
adjM
array([[0, 1, 0, 1, 1],
[1, 0, 1, 1, 0],
[0, 1, 0, 0, 0],
[1, 1, 0, 0, 1],
[1, 0, 0, 1, 0]])
The distance matrix of order 2 shows all pairs of nodes that are connected with an intermediate node. In the example, nodes 2 and 5 have nodes 4 and 1 in between, that is, there are two possible paths of order 2 that connect them: 2-4-5 and 2-1-5. For that reason, if the adjacency matrix is multiplied by itself, the element [2, 5] of the distance matrix has the value 2. This indicates that there are two paths of distance 2 between nodes 2 and 5.
# Matrix multiplication by itself
distances_order_two = adjM @ adjM
distances_order_two
array([[3, 1, 1, 2, 1],
[1, 3, 0, 1, 2],
[1, 0, 1, 1, 0],
[2, 1, 1, 3, 1],
[1, 2, 0, 1, 2]])
# Python indices start at 0
print(f"Paths of order two between nodes 2 and 5 = {distances_order_two[1, 4]}")
Paths of order two between nodes 2 and 5 = 2
You can also calculate the shortest path using the nx.shortest_path function directly.
nx.shortest_path(G, source=2, target=5)
[2, 1, 5]
Weighted graph
In a weighted graph, the edges of the graph have an associated weight. In the graphical representation of this type of network, the edges are usually shown with a different width depending on their weight.
# Graph creation
G = nx.Graph()
# Nodes and connections
G.add_weighted_edges_from(
[(1, 2, 0.5),
(2, 3, 0.9),
(1, 4, 0.1),
(1, 5, 0.75),
(4, 2, 0.01),
(5, 4, 0.3)]
)
G.edges(data=True)
EdgeDataView([(1, 2, {'weight': 0.5}), (1, 4, {'weight': 0.1}), (1, 5, {'weight': 0.75}), (2, 3, {'weight': 0.9}), (2, 4, {'weight': 0.01}), (4, 5, {'weight': 0.3})])
# It is verified that it is an undirected and weighted graph
G.is_directed()
False
nx.is_weighted(G)
True
# The weight of each edge is shown, as well as the nodes it connects
[a for a in G.edges(data=True)]
[(1, 2, {'weight': 0.5}),
(1, 4, {'weight': 0.1}),
(1, 5, {'weight': 0.75}),
(2, 3, {'weight': 0.9}),
(2, 4, {'weight': 0.01}),
(4, 5, {'weight': 0.3})]
To extract the weights, iterate over each edge and access the third element of the tuple.
weights = [a[2]["weight"] for a in G.edges(data=True)]
weights
[0.5, 0.1, 0.75, 0.9, 0.01, 0.3]
To draw a weighted graph with NetworkX, it is necessary to draw nodes and edges separately. First, a graph layout is used that defines the position of the nodes. Once their position is defined, the nodes and edges are represented. For this example, the spring_layout is used, but there are many more.
# Define the position of the nodes using a layout
pos = nx.spring_layout(G)
# Represent nodes and edges
fig, ax = plt.subplots(figsize=(3,2))
nx.draw_networkx_nodes(
G,
pos = pos,
ax = ax
)
nx.draw_networkx_edges(
G,
pos = pos,
edgelist = G.edges,
width = weights,
ax = ax
);
Other layout types

Representing nodes and edges separately allows for more control over visual features, for example, the color of nodes and edges.
pos = nx.circular_layout(G)
fig, ax = plt.subplots(figsize=(3,3))
nx.draw_networkx_nodes(G, pos=pos, node_color="red", ax=ax)
nx.draw_networkx_edges(G, pos=pos, edgelist=G.edges, width=weights, edge_color="blue", ax=ax);
Bipartite graph
When the nodes of a graph represent entities of a different nature, the term bipartite graph is used. A common example of this type of graph is publication networks where there are "article" type nodes and "writer" type nodes.
In this type of graph, connections can only occur between nodes of a different nature, each writer is connected to the articles they have written. There can be no direct connections between articles or between writers.
To create a bipartite graph in NetworkX, the add_nodes_from() method is used to add the nodes, indicating the type with the bipartite argument, and then the add_edges_from() method is used to add the relationships between them. Below is an example of how to create a bipartite graph with two sets of nodes called "group A" and "group B".
# Create an empty bipartite graph
G_movies_actors = nx.Graph()
# Add the nodes of each group
G_movies_actors.add_nodes_from(["Movie_1", "Movie_2", "Movie_3"], bipartite="Movies")
G_movies_actors.add_nodes_from(["Actor_1", "Actor_2", "Actor_3"], bipartite="Actors")
# Add the relationships between the nodes
G_movies_actors.add_edges_from(
[
("Movie_1", "Actor_1"),
("Movie_2", "Actor_2"),
("Movie_3", "Actor_3"),
("Movie_2", "Actor_3"),
("Movie_2", "Actor_1"),
]
)
G_movies_actors.nodes(data=True)
NodeDataView({'Movie_1': {'bipartite': 'Movies'}, 'Movie_2': {'bipartite': 'Movies'}, 'Movie_3': {'bipartite': 'Movies'}, 'Actor_1': {'bipartite': 'Actors'}, 'Actor_2': {'bipartite': 'Actors'}, 'Actor_3': {'bipartite': 'Actors'}})
To access the type of each node, iterate through each node and access the "bipartite" attribute.
node_type = [
G_movies_actors.nodes[i]["bipartite"]
for i in G_movies_actors.nodes()
]
node_type
['Movies', 'Movies', 'Movies', 'Actors', 'Actors', 'Actors']
# colors for each type of node in the bipartite graph
colors = {"Movies": "red", "Actors": "blue"}
node_colors = [colors[n] for n in node_type]
fig, ax = plt.subplots(figsize=(5, 4))
nx.draw(
G_movies_actors,
pos=nx.spring_layout(G_movies_actors),
with_labels=True,
node_color=node_colors,
ax=ax,
)
In the code above, the spring_layout function is used to position the nodes in the graph. With the argument with_labels=True, the node labels are displayed, and with node_color, the color of each node is assigned.
Once the bipartite graph is created, a projection of it can be made to obtain a non-bipartite graph that contains only the nodes of one of the sets. To do this, the project() method is used, specifying the set of nodes to be included in the projection. Below, a projection of the bipartite graph is generated to obtain only the nodes of the 'Actors' group.
# The nodes of the set of interest are identified
nodes_bipartite = [
n[0]
for n in G_movies_actors.nodes(data=True)
if n[1]["bipartite"] == "Actors"
]
# Projection of the bipartite graph to obtain only the nodes of group A
G_actors = nx.bipartite.projected_graph(G_movies_actors, nodes_bipartite)
fig, ax = plt.subplots(figsize=(4, 4))
nx.draw(G_actors, with_labels=True, ax=ax)
ax.set_xlim([1.2 * x for x in ax.get_xlim()])
ax.set_ylim([1.2 * y for y in ax.get_ylim()])
(-1.2399190268447968, 1.275018745745212)
Meta-information: attributes of nodes and edges
In most real graph problems, additional information about nodes and edges is available. Node attributes are added with the networkx.set_node_attributes(Graph, dictionary, name) method and edge attributes are added with the networkx.set_edge_attributes() method.
# Graph creation
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (1, 4), (1, 5), (4, 2), (5, 4)])
G.edges(data=True)
EdgeDataView([(1, 2, {}), (1, 4, {}), (1, 5, {}), (2, 3, {}), (2, 4, {}), (4, 5, {})])
# Add node attributes
node_name = {1: "Jaime", 2: "María", 3: "Julio", 4: "Rosa", 5: "Alberto"}
node_hobbies = {
1: ["Soccer"],
2: ["Dancing", "Paddle"],
3: ["Golf", "Dancing"],
4: ["Cooking"],
5: ["Cooking", "Ham"],
}
nx.set_node_attributes(G, node_name, name="Name")
nx.set_node_attributes(G, node_hobbies, name="Hobbies")
# add edge attributes
edges_weight = {
(1, 2): 0.5,
(2, 3): 0.9,
(1, 4): 0.1,
(1, 5): 0.75,
(4, 2): 0.01,
(5, 4): 0.3,
}
nx.set_edge_attributes(G, edges_weight, name="weight")
To access the attributes of nodes and edges (meta-information), G.nodes(data=True) or G.edges(data=True) is used.
These commands return a dictionary-like structure where the key is the name of the node and the value contains all the attributes.
G.nodes(data=True)
NodeDataView({1: {'Name': 'Jaime', 'Hobbies': ['Soccer']}, 2: {'Name': 'María', 'Hobbies': ['Dancing', 'Paddle']}, 3: {'Name': 'Julio', 'Hobbies': ['Golf', 'Dancing']}, 4: {'Name': 'Rosa', 'Hobbies': ['Cooking']}, 5: {'Name': 'Alberto', 'Hobbies': ['Cooking', 'Ham']}})
G.edges(data=True)
EdgeDataView([(1, 2, {'weight': 0.5}), (1, 4, {'weight': 0.1}), (1, 5, {'weight': 0.75}), (2, 3, {'weight': 0.9}), (2, 4, {'weight': 0.01}), (4, 5, {'weight': 0.3})])
The attributes of edges and nodes can be iterated directly as if it were a dictionary.
for m, n, w in G.edges(data=True):
print(
f"Edge connecting node {m} with node {n} and has a weight of {w['weight']}."
)
Edge connecting node 1 with node 2 and has a weight of 0.5. Edge connecting node 1 with node 4 and has a weight of 0.1. Edge connecting node 1 with node 5 and has a weight of 0.75. Edge connecting node 2 with node 3 and has a weight of 0.9. Edge connecting node 2 with node 4 and has a weight of 0.01. Edge connecting node 4 with node 5 and has a weight of 0.3.
Advanced network visualization
There are other libraries with which much larger networks can be visualized very efficiently and interactively. In future articles, the main alternatives will be shown in detail.

Session Information
import session_info
session_info.show(html=False)
----- matplotlib 3.10.7 networkx 3.5 pandas 2.3.3 session_info v1.0.1 ----- IPython 9.6.0 jupyter_client 7.4.9 jupyter_core 5.9.1 notebook 6.5.7 ----- Python 3.13.9 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 19:16:10) [GCC 11.2.0] Linux-6.14.0-35-generic-x86_64-with-glibc2.39 ----- Session information updated at 2025-11-21 00:13
Bibliography
Network Science - Albert-László Barabási
https://ericmjl.github.io/Network-Analysis-Made-Simple/01-introduction/03-viz/
Citation Instructions
How to cite this document?
If you use this document or any part of it, we would appreciate it if you would cite it. Thank you very much!
Introduction to graphs and networks with Python by Fernando Carazo and Joaquín Amat Rodrigo, available under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) license at https://www.cienciadedatos.net/documentos/pygml01-introduction-graphs-networks-python.html
Did you like the article? Your help is important
Your contribution will help me to continue generating free outreach content. Thank you very much! 😊
This document created by Joaquín Amat Rodrigo is licensed under Attribution-NonCommercial-ShareAlike 4.0 International.
You are free to:
-
Share: copy and redistribute the material in any medium or format.
-
Adapt: remix, transform, and build upon the material.
Under the following terms:
-
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial: You may not use the material for commercial purposes.
-
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
