Visualizing Data with pyCirclize: A Guide to Circular Plots
PyCirclize is a versatile Python package designed for creating eye-catching circular visualizations. Inspired by the R package “circlize”, it leverages the capabilities of matplotlib to generate various circular plots, including Circos Plots, Chord Diagrams, and Radar Charts.
In this article, we will implement examples using pyCirclize to demonstrate its capabilities in creating circular visualizations. We’ll cover everything from the basics of installing pyCirclize to advanced use cases like human genome Circos plots and circular heatmaps.
Table of Content
- Understanding Circular Visualization
- Circular Visualization Implementation using pyCirclize
- Installation pyCirclize in Python
- Example 1: Circular Visualization Layout
- Example 2: Visualizing Data on Circular Track
- Example 3: Link Visualization Between Circular Plot
- Example 4: Chord Diagram from Matrix
- Example 5: Chord Diagram from From-To Table
- Example 6: Human Genome Circos Plot
- Example 7: Circular Histogram Visualization
- Example 8: Circular Heatmap in Python
- Key Features and Customization Options for Circular Plots
Understanding Circular Visualization
Circular visualization is a method of representing data in a circular layout rather than a traditional linear one. This approach is particularly useful for displaying relationships and patterns in data, especially when there are multiple variables or complex connections involved. In Python, we can create circular visualizations using a library called pyCirclize. This library provides an easy way to generate circular plots with Python code.
We can customize the colors, labels, and other aspects of your plot to make it clear and informative. Circular visualization is especially popular in fields like genomics, where it’s used to visualize connections between different parts of the genome. But it can be applied to all sorts of data, from social networks to hierarchical structures.
This approach is particularly effective for:
- Relationship Visualization: Highlighting connections between different elements in networks, flows, or hierarchies.
- Multi-Variable Exploration: Visualizing multiple variables or datasets simultaneously in a compact format.
- Pattern Recognition: Identifying cyclical patterns or periodic events within data.
Circular Visualization Implementation using pyCirclize
Installation pyCirclize in Python
To install pyCirclize, use the following pip command:
pip install pycirclize
Example 1: Circular Visualization Layout
- Circular data visualization simplifies understanding complex relationships in data, like networks and flows.
- It useful for analyzing data on different sectors, each representing a category. Within each sector, we can have multiple tracks for plotting various data points
from pycirclize import Circos
# Define sectors
sectors = {"X": 8, "Y": 14, "Z": 10, "W": 18, "V": 13}
circos = Circos(sectors, space=4)
for sector in circos.sectors:
# Plot sector axis and name
sector.axis(fc="none", ls="solid", lw=1.5, ec="grey", alpha=0.6)
sector.text(f"Sector: {sector.name}={sector.size}", size=14)
# Track 1 (Radius: 70 - 100)
track1 = sector.add_track((70, 100))
track1.axis(fc="orange", alpha=0.6)
track1.text(track1.name)
# Track 2 (Radius: 40 - 65)
track2 = sector.add_track((40, 65))
track2.axis(fc="lightblue", alpha=0.6)
track2.text(track2.name)
# Track 3 (Radius: 10 - 35)
track3 = sector.add_track((10, 35))
track3.axis(fc="lightgreen", alpha=0.6)
track3.text(track3.name)
circos.savefig("example_custom01.png")
Output:
Example 2: Visualizing Data on Circular Track
- Tracks come with diverse plotting functionalities. Users can plot lines, points, and bars using track.line(), track.scatter(), and track.bar() methods respectively.
- Additionally, x-ticks can be plotted using track.xticks_by_interval().
from pycirclize import Circos
import numpy as np
np.random.seed(2)
sectors = {"Alpha": 8, "Beta": 14, "Gamma": 10, "Delta": 18, "Epsilon": 13}
circos = Circos(sectors, space=4)
for sector in circos.sectors:
# Plot sector name
sector.text(f"Sector: {sector.name}", r=105, size=14)
# Generate x positions and random y values for plotting
x = np.arange(sector.start, sector.end) + 0.5
y = np.random.randint(0, 100, len(x))
# Plot line
line_track = sector.add_track((70, 100), r_pad_ratio=0.1)
line_track.axis()
line_track.xticks_by_interval(1)
line_track.line(x, y)
# Plot points
points_track = sector.add_track((40, 65), r_pad_ratio=0.1)
points_track.axis()
points_track.scatter(x, y)
# Plot bar
bar_track = sector.add_track((10, 35), r_pad_ratio=0.1)
bar_track.axis()
bar_track.bar(x, y)
circos.savefig("custom_data_plotting.png")
Output:
Example 3: Link Visualization Between Circular Plot
pyCirclize also facilitates plotting links within or between sectors, allowing visualization of relationships such as networks and flows. Users can customize the appearance of each link.
from pycirclize import Circos
sectors = {"Alpha": 8, "Beta": 18, "Gamma": 13}
name2color = {"Alpha": "purple", "Beta": "green", "Gamma": "orange"}
circos = Circos(sectors, space=4)
for sector in circos.sectors:
track = sector.add_track((90, 100))
track.axis(fc=name2color[sector.name])
track.text(sector.name, color="black", size=11)
track.xticks_by_interval(1)
# Plot various styles of links
circos.link(("Alpha", 0, 1), ("Alpha", 5, 6))
circos.link(("Alpha", 2, 3), ("Alpha", 6, 5), color="yellow")
circos.link(("Alpha", 7, 8), ("Beta", 2, 1), direction=1, color="magenta")
circos.link(("Beta", 3, 5), ("Gamma", 4, 6), direction=1, ec="blue", lw=1, hatch="\\\\")
circos.link(("Beta", 15, 13), ("Beta", 10, 12), r1=85, r2=85, color="pink", ec="black", lw=2, ls="dotted")
circos.link(("Gamma", 0, 2), ("Beta", 1, 0), direction=1, color="lightblue")
circos.link(("Gamma", 10, 12), ("Alpha", 3, 2), direction=2, color="brown", ec="grey", lw=1, ls="dashed")
circos.savefig("custom_link_visualization.png")
Output:
Example 4: Chord Diagram from Matrix
- Chord diagrams visualize pairwise relationships between objects.
- Here, a chord diagram is created from a matrix, where each row and column represents an object, and the matrix cells represent the strength of relationships between objects.
from pycirclize import Circos
import pandas as pd
# Create matrix data (8 x 8)
row_names = list("ABCDEFGH")
col_names = row_names
matrix_data = [
[45, 110, 50, 15, 115, 120, 110, 170],
[100, 130, 150, 160, 80, 210, 70, 100],
[100, 50, 70, 120, 80, 110, 100, 110],
[60, 130, 25, 180, 190, 170, 70, 90],
[200, 110, 45, 50, 190, 90, 10, 50],
[80, 5, 95, 110, 115, 160, 105, 10],
[160, 90, 100, 140, 95, 40, 90, 150],
[85, 80, 130, 70, 115, 20, 35, 250],
]
matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)
# Initialize from matrix
circos = Circos.initialize_from_matrix(
matrix_df,
space=2,
r_lim=(90, 100),
cmap="tab20",
ticks_interval=400,
label_kws=dict(r=92, size=11, color="black"),
)
circos.savefig("custom_chord_matrix.png")
Output:
Example 5: Chord Diagram from From-To Table
- Similar to above example, a chord diagram is created, but this time from a from-to table.
- The table contains pairs of objects and the strength of relationships between them.
from pycirclize import Circos
from pycirclize.parser import Matrix
import pandas as pd
# Create from-to table dataframe & convert to matrix
fromto_table_df = pd.DataFrame(
[
["Alpha", "Beta", 12],
["Alpha", "Gamma", 6],
["Alpha", "Delta", 18],
["Alpha", "Epsilon", 22],
["Alpha", "Zeta", 4],
["Beta", "Alpha", 4],
["Beta", "Eta", 18],
["Zeta", "Delta", 15],
["Zeta", "Epsilon", 3],
["Epsilon", "Alpha", 22],
["Epsilon", "Delta", 8],
],
columns=["from", "to", "value"],
)
matrix = Matrix.parse_fromto_table(fromto_table_df)
circos = Circos.initialize_from_matrix(
matrix,
space=2,
cmap="plasma",
ticks_interval=4,
label_kws=dict(size=11, r=105),
link_kws=dict(direction=1, ec="grey", lw=0.5),
)
circos.savefig("custom_chord_fromto.png")
Output:
Example 6: Human Genome Circos Plot
- Similar to above Viral Genome Circos Plot example , but focusing on the human genome.
- This Circos plot showcases the structure of the human genome, including chromosomes, genes, and other genomic features, providing a comprehensive view of the genome organization.
from pycirclize import Circos
from pycirclize.utils import ColorCycler, load_eukaryote_example_dataset
# Load hg38 dataset
chr_bed_file, cytoband_file, chr_links = load_eukaryote_example_dataset("hg38")
# Initialize Circos from BED chromosomes
circos = Circos.initialize_from_bed(chr_bed_file, space=2)
circos.text("Human Genome\n(hg38)", deg=300, r=140, size=11)
# Add cytoband tracks from cytoband file
circos.add_cytoband_tracks((90, 95), cytoband_file)
# Create chromosome color mapping
ColorCycler.set_cmap("rainbow")
chr_names = [s.name for s in circos.sectors]
colors = ColorCycler.get_color_list(len(chr_names))
chr_name2color = {name: color for name, color in zip(chr_names, colors)}
# Plot chromosome names & xticks
for sector in circos.sectors:
sector.text(sector.name, r=110, size=9, color=chr_name2color[sector.name])
sector.get_track("cytoband").xticks_by_interval(
30000000,
label_size=7,
label_orientation="vertical",
label_formatter=lambda v: f"{v / 1000000:.0f} Mb",
)
# Plot chromosome links
for link in chr_links:
region1 = (link.query_chr, link.query_start, link.query_end)
region2 = (link.ref_chr, link.ref_start, link.ref_end)
color = chr_name2color[link.query_chr]
if link.query_chr in ("chr1", "chr7", "chr15") and link.query_chr != link.ref_chr:
circos.link(region1, region2, color=color)
circos.savefig("genome_example02.png")
Output:
Example 7: Circular Histogram Visualization
- A circular histogram is an effective way to display frequency data or distributions in a circular format. This method can be particularly useful for visualizing periodic data or cyclical events.
- In this example, we will create a circular histogram that represents the frequency of certain events occurring within different sectors. Each sector represents a different category, and within each sector, the frequency of events is displayed as bars.
from pycirclize import Circos
import numpy as np
import matplotlib.pyplot as plt
# Define sectors with different sizes
sectors = {"Category A": 12, "Category B": 8, "Category C": 10, "Category D": 15, "Category E": 5}
# Initialize Circos with defined sectors
circos = Circos(sectors, space=5)
# Generate synthetic frequency data for each sector
np.random.seed(42)
frequencies = {
"Category A": np.random.randint(1, 10, sectors["Category A"]),
"Category B": np.random.randint(1, 10, sectors["Category B"]),
"Category C": np.random.randint(1, 10, sectors["Category C"]),
"Category D": np.random.randint(1, 10, sectors["Category D"]),
"Category E": np.random.randint(1, 10, sectors["Category E"])
}
# Add tracks and plot histograms
for sector in circos.sectors:
sector_name = sector.name
freq_data = frequencies[sector_name]
track = sector.add_track((50, 100))
track.axis(fc="lightgrey")
track.bar(np.arange(len(freq_data)) + 0.5, freq_data, color="skyblue")
# Save and display the plot
circos.savefig("circular_histogram.png")
plt.show()
Output:
Example 8: Circular Heatmap in Python
- A circular heatmap visualizes data intensity in a circular layout, which can be useful for representing data distributions over time or across different categories.
- Here, we will create a circular heatmap to represent the intensity of data points within different sectors. Each sector represents a different category, and the intensity of data points is displayed using a heatmap color scale.
from pycirclize import Circos
import numpy as np
import matplotlib.pyplot as plt
# Define sectors
sectors = {"Sector 1": 10, "Sector 2": 15, "Sector 3": 20, "Sector 4": 10, "Sector 5": 15}
# Initialize Circos plot
circos = Circos(sectors, space=5)
# Generate synthetic heatmap data for each sector
np.random.seed(0)
heatmap_data = {
"Sector 1": np.random.rand(sectors["Sector 1"]),
"Sector 2": np.random.rand(sectors["Sector 2"]),
"Sector 3": np.random.rand(sectors["Sector 3"]),
"Sector 4": np.random.rand(sectors["Sector 4"]),
"Sector 5": np.random.rand(sectors["Sector 5"])
}
# Add heatmap tracks
for sector in circos.sectors:
sector_name = sector.name
data = heatmap_data[sector_name]
heatmap_track = sector.add_track((75, 100))
heatmap_track.axis(fc="none")
heatmap_track.heatmap(data, cmap="viridis")
# Save and display the plot
circos.savefig("circular_heatmap.png")
plt.show()
Output:
Key Features and Customization Options for Circular Plots
- Circular Layouts: Suitable for genomic data, network data, and hierarchical data.
- Customization: Users can specify colors, sizes, labels, and other graphical elements.
- Data Integration: Supports data from Pandas DataFrames, NumPy arrays, and networkx graphs.
- Partitioning: Allows partitioning the circular plot into sectors for different datasets or categories.
- Data Annotation: Users can add text, lines, or other graphical elements to highlight specific features.
- Interactive Features: Includes hover tooltips and zooming capabilities.
- Efficiency: Designed to handle large datasets efficiently.
- High-Quality Output: Suitable for publication or presentation, with export options in PNG, PDF, and SVG formats.
Conclusion
Data visualization is crucial for understanding complex datasets, and circular visualization, showcased by tools like pyCirclize in Python, provides a robust method to uncover relationships and patterns within data. With its easy-to-use features, pyCirclize allows users to generate visually striking circular plots, enriching our comprehension of diverse datasets spanning fields such as genomics and network analysis. Through pyCirclize, researchers and analysts can effectively explore and convey intricate data, fostering insights and discoveries in their fields.
Contact Us