Easy API for forest plots.
A Python package to make publication-ready but customizable forest plots.
This package makes publication-ready forest plots easy to make out-of-the-box. Users provide a dataframe
(e.g. from a spreadsheet) where rows correspond to a variable/study with columns including estimates, variable labels, and lower and upper confidence interval limits.
Additional options allow easy addition of columns in the dataframe
as annotations in the plot.
Release | |
Status | |
Coverage | |
Python | |
Docs | |
Meta |
> - [Installation](#installation) > - [Quick Start](#quick-start) > - [Some Examples with Customizations](#some-examples-with-customizations) > - [Gallery and API Options](#gallery-and-api-options) > - [Known Issues](#known-issues) > - [Background and Additional Resources](#background-and-additional-resources) > - [Contributing](#contributing)
pip install forestplot
git clone https://github.com/LSYS/forestplot.git
cd forestplot
pip install .
import forestplot as fp
df = fp.load_data("sleep") # companion example data
df.head(3)
var | r | moerror | label | group | ll | hl | n | power | p-val | |
---|---|---|---|---|---|---|---|---|---|---|
0 | age | 0.0903729 | 0.0696271 | in years | age | 0.02 | 0.16 | 706 | 0.671578 | 0.0163089 |
1 | black | -0.0270573 | 0.0770573 | =1 if black | other factors | -0.1 | 0.05 | 706 | 0.110805 | 0.472889 |
2 | clerical | 0.0480811 | 0.0719189 | =1 if clerical worker | occupation | -0.03 | 0.12 | 706 | 0.247768 | 0.201948 |
(* This is a toy example of how certain factors correlate with the amount of sleep one gets. See the notebook that generates the data.)
Make the forest plot
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # columns containing conf. int. lower and higher limits
varlabel="label", # column containing variable label
ylabel="Confidence interval", # y-label title
xlabel="Pearson correlation" # x-label title
)
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
moerror="moerror", # columns containing conf. int. margin of error
varlabel="label", # column containing variable label
groupvar="group", # Add variable groupings
# group ordering
group_order=["labor factors", "occupation", "age", "health factors",
"family factors", "area of residence", "other factors"],
sort=True # sort in ascending order (sorts within group if group is specified)
)
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # columns containing conf. int. lower and higher limits
varlabel="label", # column containing variable label
groupvar="group", # Add variable groupings
# group ordering
group_order=["labor factors", "occupation", "age", "health factors",
"family factors", "area of residence", "other factors"],
sort=True, # sort in ascending order (sorts within group if group is specified)
pval="p-val", # Column of p-value to be reported on right
color_alt_rows=True, # Gray alternate rows
ylabel="Est.(95% Conf. Int.)", # ylabel to print
**{"ylabel1_size": 11} # control size of printed ylabel
)
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
pval="p-val", # column containing p-values to be formatted
annote=["n", "power", "est_ci"], # columns to report on left of plot
annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"], # ^corresponding headers
rightannote=["formatted_pval", "group"], # columns to report on right of plot
right_annoteheaders=["P-value", "Variable group"], # ^corresponding headers
xlabel="Pearson correlation coefficient", # x-label title
table=True, # Format as a table
)
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
ci_report=False, # Turn off conf. int. reporting
flush=False, # Turn off left-flush of text
**{'fontfamily': 'sans-serif'} # revert to sans-serif
)
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
pval="p-val", # column containing p-values to be formatted
annote=["n", "power", "est_ci"], # columns to report on left of plot
annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"], # ^corresponding headers
rightannote=["formatted_pval", "group"], # columns to report on right of plot
right_annoteheaders=["P-value", "Variable group"], # ^corresponding headers
groupvar="group", # column containing group labels
group_order=["labor factors", "occupation", "age", "health factors",
"family factors', "area of residence", "other factors"],
xlabel="Pearson correlation coefficient", # x-label title
xticks=[-.4,-.2,0, .2], # x-ticks to be printed
sort=True, # sort estimates in ascending order
table=True, # Format as a table
# Additional kwargs for customizations
**{"marker": "D", # set maker symbol as diamond
"markersize": 35, # adjust marker size
"xlinestyle": (0, (10, 5)), # long dash for x-reference line
"xlinecolor": ".1", # gray color for x-reference line
"xtick_size": 12, # adjust x-ticker fontsize
}
)
Check out this jupyter notebook for a gallery variations of forest plots possible out-of-the-box.
The table below shows the list of arguments users can pass in.
More fined-grained control for base plot options (eg font sizes, marker colors) can be inferred from the example notebook gallery.
Option | Description | Required |
---|---|---|
dataframe |
Pandas dataframe where rows are variables (or studies for meta-analyses) and columns include estimated effect sizes, labels, and confidence intervals, etc. | ✓ |
estimate |
Name of column in dataframe containing the estimates. |
✓ |
varlabel |
Name of column in dataframe containing the variable labels (study labels if meta-analyses). |
✓ |
ll |
Name of column in dataframe containing the conf. int. lower limits. |
✓* |
hl |
Name of column in dataframe containing the conf. int. higher limits. |
✓* |
moerror |
Name of column in dataframe containing the conf. int. margin of errors. |
✓* |
form_ci_report |
If True (default), report the estimates and confidence interval beside the variable labels. | |
ci_report |
If True (default), format the confidence interval as a string. | |
groupvar |
Name of column in dataframe containing the variable grouping labels. |
|
group_order |
List of group labels indicating the order of groups to report in the plot. | |
annote |
List of columns to add as annotations on the left-hand side of the plot. | |
annoteheaders |
List of column headers for the left-hand side annotations. | |
rightannote |
List of columns to add as annotations on the right-hand side of the plot. | |
right_annoteheaders |
List of column headers for the right-hand side annotations. | |
pval |
Name of column in dataframe containing the p-values. |
|
starpval |
If True (default), format p-values with stars indicating statistical significance. | |
sort |
If True, sort variables by estimate values in ascending order. |
|
sortby |
Name of column to sort by. Default is estimate . |
|
flush |
If True (default), left-flush variable labels and annotations. | |
decimal_precision |
Number of decimal places to print. (Default = 2) | |
figsize |
Tuple indicating core figure size. Default is (4, 8) | |
xticks |
List of xticklabels to print on x-axis. | |
ylabel |
Y-label title. | |
xlabel |
X-label title. | |
color_alt_rows |
If True, shade out alternating rows in gray. | |
preprocess |
If True (default), preprocess the dataframe before plotting. |
|
return_df |
If True, returned the preprocessed dataframe . |
(If ll
and* hl
are specified, then the moerror
(margin of error) is not required, and vice versa.)
Matplotlib
API used.monospace
font.More about forest plots:
Forest plots have many aliases (h/t Chris Alexiuk). Other names include coefplots, coefficient plots, meta-analysis plots, dot-and-whisker plots, blobbograms, margins plots, regression plots, and ropeladder plots.
Forest plots in the medical and health sciences literature are plots that report results from different studies as a meta-analysis. Markers are centered on the estimated effect and horizontal lines running through each marker depicts the confidence intervals.
The simplest version of a forest plot has two columns: one for the variables/studies, and the second for the estimated coefficients and confidence intervals.
This layout is similar to coefficient plots (coefplots) and is thus useful for more than meta-analyses.
* [[1]](https://doi.org/10.1038/s41433-021-01867-6) Chang, Y., Phillips, M.R., Guymer, R.H. et al. The 5 min meta-analysis: understanding how to read and interpret a forest plot. Eye 36, 673–675 (2022). * [[2]](https://doi.org/10.1136/bmj.322.7300.1479) Lewis S, Clarke M. Forest plots: trying to see the wood and the trees BMJ 2001; 322 :1479
More about this package:
The package is lightweight, built on pandas
, numpy
, and matplotlib
.
It is slightly opinioniated in that the aesthetics of the plot inherits some of my sensibilities about what makes a nice figure.
You can however easily override most defaults for the look of the graph. This is possible via **kwargs
in the forestplot
API (see Gallery and API options) and the matplotlib
API.
Planned enhancements include forest plots each row can have multiple coefficients (e.g. from multiple models).
* [[1]](https://www.stata-journal.com/article.html?article=gr0059) [Stata] Jann, Ben (2014). Plotting regression coefficients and other estimates. The Stata Journal 14(4): 708-737. * [[2]](https://www.statsmodels.org/devel/examples/notebooks/generated/metaanalysis1.html) [Python] Meta-Analysis in statsmodels * [[3]](https://github.com/seafloor/forestplot) [Python] Matt Bracher-Smith's Forestplot * [[4]](https://github.com/fsolt/dotwhisker) [R] Solt, Frederick and Hu, Yue (2021) dotwhisker: Dot-and-Whisker Plots of Regression Results * [[5]](https://rpubs.com/mbounthavong/forest_plots_r) [R] Bounthavong, Mark (2021) Forest plots. RPubs by RStudio
Contributions are welcome, and they are greatly appreciated!
Potential ways to contribute:
Issues
Please submit bugs, questions, or issues you encounter to the GitHub Issue Tracker.
For bugs, please provide a minimal reproducible example demonstrating the problem.