Fairlearn metrics. Parameters fairlearn.

Fairlearn metrics Jan 14, 2025 · Fairlearn can be integrated seamlessly into this process. We can evaluate these metrics directly: Dec 11, 2023 · The documentation page describes the following fairness metrics along with some code snippets: Demographic parity Equalized odds Equal opportunity However, there is from fairlearn. You can, for example, use a postprocessing Mar 29, 2023 · fairlearn. equalized_odds_ratio (y_true, y_pred, *, sensitive_features, method = 'between_groups', sample_weight Making Derived Metrics¶. This may involve See fairlearn. plot_model_comparison# fairlearn. The equal opportunity difference is defined as the difference between the largest and the smallest group-level true positive rates, \(E[h(X) | A=a import pandas as pd from sklearn. Currently, for discrete y and sensitive_features ALL classes need to be passed in the first call to fit! The names of the generated functions are of the form fairlearn. Using existing metric definitions from scikit-learn we can evaluate metrics for subgroups within the data as below: In fairlearn we can define a custom fairness metric for NPV using fairlearn. Fairlearn dashboard# The Fairlearn dashboard was a Jupyter notebook widget for assessing how a model’s predictions impact different groups (e. metrics # Functionality for computing metrics, with a particular focus on disaggregated metrics. 6). The other property in the fairlearn. Evaluating fairness-related metrics# First, Fairlearn provides fairness-related metrics that can be compared between groups and for the overall population. This contains the metrics evaluated on each subgroup defined by the categories in the sensitive_features= argument. , difference or group_min). Using existing metric definitions from scikit-learn we can evaluate metrics for subgroups within the data as below: import pandas as pd from fairlearn. Read more in the User Guide . from fairlearn. Suppose we have some ‘true’ values, some predictions from a model, and also a sensitive feature from functools import partial import pandas as pd from sklearn. Fairlearn dashboard is a Jupyter notebook widget for assessing how a model’s predictions impact different groups (e. The sample weights. utils. metrics also provides metric functions that return scalars much like typical scikit-learn metrics. Metrics¶. MetricFrame with precomputed metrics and metric errors and a conf_intervals array to interpret the columns of the fit (X, y, *, sensitive_features = None) [source] #. postprocessing. 1. float Evaluating fairness-related metrics# Firstly, Fairlearn provides fairness-related metrics that can be compared between groups and for the overall population. The y_true argument is used to make this calculation. Return type. In this case, we have results for males and females: More information about plotting metrics can be found in the plotting section of the User Guide. In addition, we pass in the sensitive_features A_test to disaggregate our model results. make_derived_metric(). tree import DecisionTreeClassifier See fairlearn. metrics also enables a comparison of multiple models. In this case, we have results for males and females: Evaluating fairness-related metrics# First, Fairlearn provides fairness-related metrics that can be compared between groups and for the overall population. model_selection import train_test_split from sklearn. plot_model_comparison (*, y_preds, y_true = None, sensitive_features = None, x_axis_metric = None, y_axis A Python package to assess and improve fairness of machine learning models. For more information on fairness metrics, review Common fairness metrics. plot_model_comparison (*, y_preds, y_true = None, sensitive_features = None, x_axis_metric = None, y_axis fairlearn. metrics import MetricFrame We download the data set using the function fetch_adult() in fairlearn. Overview of Fairlearn¶. metrics. _bootstrap import calculate_pandas_quantiles, generate_bootstrap_samples Additionally, group metrics yield the minimum and maximum metric value and for which groups these values were observed, as well as the difference and ratio between the maximum and the minimum values. MetricFrame object is by_group. enable_metric_frame_plotting import plot_metric_frame from fairlearn. Algorithms for mitigating unfairness in a variety of AI tasks and along a variety of fairness definitions. datasets . Fit the model based on the given training data and sensitive features. The callback may return a Boolean value. This applies for any kind of model that users may already use, but also for models created with mitigation techniques from the Mitigation section. See fairlearn. equalized_odds_ratio# fairlearn. In this approach, disparity constraints are cast as Lagrange multipliers, which cause the reweighting and relabelling of the input data. 2. selection_rate() and sklearn. The predictor’s output is adjusted to fulfill specified parity constraints. make_derived_metric() function. from functools import partial import pandas as pd from sklearn. The solution to this problem is found by centering sensitive features, fitting a linear regression model to the non-sensitive features and reporting the residual. 2. airlearn supports a wide range of fairness metrics for assessing a model’s impacts on different groups of people, covering both classification and regression tasks. Fairlearn mitigation algorithms largely follow the conventions of scikit-learn, meaning that they implement the fit method to train a model and the predict method to make predictions. In this notebook we’ll just be using a Wilson score interval. Using existing metric definitions from scikit-learn we can evaluate metrics for subgroups within the data as below: Making Derived Metrics#. Fairness constraints fairlearn. postprocessing package¶ This module contains methods which operate on a predictor, rather than an estimator. Grounded in the understanding that fairness is a sociotechnical challenge, the Making Derived Metrics¶. © Copyright 2018 - 2022, Fairlearn contributors. Sphinx 4. demographic_parity_difference (y_true, y_pred, *, sensitive_features, method = 'between_groups', sample_weight = None) [source] # Calculate the demographic parity difference. To compute the disaggregated performance metrics, we will use the MetricFrame object within the Fairlearn library. 0. Create a dashboard dictionary using Fairlearn's metrics package. This notebook demonstrates the use of the fairlearn. fairlearn. In a traditional model analysis, we would now look at some metrics evaluated on the entire dataset. Fairlearn does support a variety of metrics, not just accuracy. count() can be used to display the number of data points in each subgroup. The Fairlearn Python package has two components: Metrics for assessing which groups are negatively impacted by a model, and for comparing multiple models in terms of various fairness and accuracy metrics. metrics import pandas as pd from sklearn. If the returned value is True, the optimization algorithm terminates. MetricFrame class to help with this quantification. Fairlearn dashboard. You could evaluate the various labels individually, of course. Note. We can evaluate these metrics directly: Evaluating fairness-related metrics¶ Firstly, Fairlearn provides fairness-related metrics that can be compared between groups and for the overall population. false_negative_rate (y_true, y_pred, sample_weight = None, pos_label = None) [source] # Calculate the false negative rate (also called miss rate). metrics import MetricFrame We download the data set using fetch_openml() function in sklearn. - fairlearn/fairlearn See fairlearn. For example, it is possible to pass sklearn. We therefore encourage researchers, practitioners, and other stakeholders to contribute fairness metrics and assessment tools, unfairness mitigation algorithms, case studies and other educational materials to Fairlearn as we experiment, learn, and evolve the project together. The demographic parity ratio is defined as the ratio between the smallest and the largest group-level selection rate, \(E[h(X) | A=a]\) , across all values \(a The other property in the fairlearn. For our purpose, a metric is a function with signature f(y_true, y_pred, . In this case, we have results for males and females: fairlearn. accuracy_score() or sklearn. metrics import accuracy_score, precision_score, recall_score performance. Note Demographic parity is also sometimes referred to as independence , group fairness , statistical parity , and disparate impact . In this case, we have results for males and females: from fairlearn. For more information refer to the subpackage fairlearn. widget. More information about plotting metrics can be found in the plotting section of the User Guide. The classification fairness metrics that are supported by Fairlearn include demographic parity, equalized odds, and worst-case In a traditional model analysis, we would now look at some metrics evaluated on the entire dataset. The names of the generated functions are of the form fairlearn. datasets import fetch_diabetes_hospital Note that in the Fairlearn API, fairlearn. Note that in the Fairlearn API, fairlearn. _input_manipulations import _convert_to_ndarray_and_squeeze from . The demographic parity difference is defined as the difference between the largest and the smallest group-level selection rate, \(E[h(X) | A=a Defining custom fairness metrics; Intersecting Groups. Returns. The dashboard class, wraps the dashboard component. We will pass in our dictionary of metrics fairness_metrics, along with our test labels y_test and test predictions Y_pred. demographic_parity_difference() is only defined for classification. On the assessment side, Fairlearn’s MetricFrame can be used with any metrics for supervised learning, including those for multi-class classification. One of “worst_case fairlearn. May 19, 2020 · 5. Returns: The equalized odds difference. We must also specify the type of prediction (binary The other property in the fairlearn. datasets import fetch_diabetes_hospital difference (method = 'between_groups', errors = 'coerce') [source] #. The fairlearn. Aug 28, 2024 · Precompute fairness metrics. sample_weight array-like. demographic_parity_ratio (y_true, y_pred, *, sensitive_features, method = 'between_groups', sample_weight = None) [source] # Calculate the demographic parity ratio. plot_model_comparison (*, y_preds, y_true = None, sensitive_features = None, x_axis_metric = None, y_axis Making Derived Metrics#. _annotated_metric_function import AnnotatedMetricFunction from . For example fairlearn. equalized_odds_difference# fairlearn. Functionality for computing metrics, with a particular focus on disaggregated metrics. experimental. Fairlearn is built and maintained by open source contributors with a variety of backgrounds and expertise. The “Diabetes 130-Hospitals” dataset represents 10 years of clinical care at 130 U. MetricFrame. Returns: The equalized odds ratio. Suppose we have some ‘true’ values, some predictions from a model, and also a sensitive feature difference (method = 'between_groups', errors = 'coerce') [source] #. metrics import accuracy_score, precision_score from sklearn. The module fairlearn. What traps can we fall into when modeling a social problem?# API Docs; fairlearn. The argument pos_label specifies the ‘good’ outcome. FairlearnDashboard (*, sensitive_features, y_true, y_pred, sensitive_feature_names = None) [source] ¶ Bases: object. Join the effort and contribute feedback, metrics, algorithms, visualizations, ideas and more, so we can evolve the toolkit together! Contributor Guide fairlearn. This step is crucial to understand the baseline fairness before applying any mitigation strategies. If you have a specific idea of what you'd like to see please open a new feature request. S. On the assessment side, Fairlearn’s metrics. On To compute the disaggregated performance metrics, we will use the MetricFrame object within the Fairlearn library. ) where y_true are the set of true values and y_pred are values predicted by a machine learning algorithm. tree import DecisionTreeClassifier from fairlearn. The _create_group_metric_set method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available). The equalized odds difference. The aggregation method. Fairlearn provides the fairlearn. datasets import fetch_adult from sklearn. , different ethnicities), and also for comparing multiple models along different fairness and performance metrics. equal_opportunity_ratio (y_true, y_pred, *, sensitive_features, method = 'between_groups', sample_weight = None) [source] # Calculate the equal opportunity ratio. metrics module provides the means to assess fairness-related metrics for models. tree import DecisionTreeClassifier fairlearn. Fairness Assessment: Utilize Fairlearn's metrics to evaluate the fairness of your model. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. plot_model_comparison (*, y_preds, y_true = None Contribute to Fairlearn. datasets import fetch_openml from sklearn. Mar 29, 2023 · Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. count# fairlearn. For our purpose, a metric is a function with signature f(y_true, y_pred,. difference (method = 'between_groups', errors = 'coerce') [source] #. The Fairlearn package has two components: A dashboard for assessing which groups are negatively impacted by a model, and for comparing multiple models in terms of various fairness and accuracy metrics. This function takes in a fairlearn. reductions package¶ This module contains algorithms implementing the reductions approach to disparity mitigation. confusion_matrix() as the metric functions, and supply multi-class data for y_true and y_pred. equalized_odds_difference (y_true, y_pred, *, sensitive_features, method = 'between_groups', sample fairlearn. The postprocessors learn how to adjust the predictor’s output from the training data. , different ethnicities), and also for comparing multiple models along different fairness and accuracy metrics. Plotting grouped metrics; Fairlearn dashboard; Mitigation. Preprocessing. Bias Mitigation: Apply Fairlearn's algorithms to mitigate identified biases. datasets import fetch_diabetes_hospital Confidence interval calculations#. confusion_matrix() as the metric functions, and supply multiclass data for y_true and y_pred. The equal opportunity ratio is defined as the ratio between the smallest and the largest group-level true positive rate, \(E[h(X) | A=a]\) , across all values \(a Evaluating fairness-related metrics# Firstly, Fairlearn provides fairness-related metrics that can be compared between groups and for the overall population. false_positive_rate()) and transform (a string indicating the type of transformation, e. Suppose in this case, the relevant metrics are fairlearn. Using existing metric definitions from scikit-learn we can evaluate metrics for subgroups within the data as below: Mar 11, 2021 · Previous answer: Fairlearn's metrics are designed for binary classification or regression. selection_rate (y_true, y_pred, *, pos_label = 1, sample_weight = None) [source] # Calculate the fraction of predicted labels matching the ‘good’ outcome. Extra Arguments to Metric functions; More Complex Metrics; Plotting. It includes: A Python library for fairness assessment and improvement (fairness metrics, mitigation algorithms, plotting, etc. which is passed the self object, the step number, the inputs X, the targets y, the sensitive features z, and the positive label. equal_opportunity_difference (y_true, y_pred, *, sensitive_features, method = 'between_groups', sample_weight = None) [source] # Calculate the equal opportunity difference. Using existing metric definitions from scikit-learn we can evaluate metrics for subgroups within the data as below: fairlearn. The equalized odds ratio. widget package¶ Package for the Fairlearn Dashboard widget. Suppose we have some ‘true’ values, some predictions from a model, and also a sensitive feature The other property in the fairlearn. What traps can we fall into when modeling a social problem?# Note that in the Fairlearn API, fairlearn. <base_metric>_<transformation>. hospitals and delivery networks, collected from 1999 to 2008. Multiple Sensitive Features; Control Features; Advanced Usage of MetricFrame. As Fairlearn grows to include additional fairness metrics, unfairness mitigation algorithms, and visualization capabilities, we hope that it will be shaped by a diverse community of stakeholders, ranging from data scientists, developers, and business decision makers to the people whose lives may be affected by the predictions of AI systems. The demographic parity difference is defined as the difference between the largest and the smallest group-level selection rate, \(E[h(X) | A=a Making Derived Metrics#. fairlearn. This is how MetricFrame was introduced in the Performing a Fairness Assessment section above. true_positive_rate (y_true, y_pred, sample_weight = None, pos_label = None) [source] # Calculate the true positive rate (also called sensitivity, recall, or hit rate). MetricFrame can be used with any metrics for supervised learning, including those for multiclass classification. Return the maximum absolute difference between groups for each metric. The classification fairness metrics that are supported by Fairlearn include demographic parity, equalized odds, and worst-case fairlearn. Fairlearn is an open-source, community-driven project to help data scientists improve fairness of AI systems. metrics package#. float 2. Functionality for computing metrics, with a particular focus on group metrics. Dec 11, 2024 · A Python package to assess and improve fairness of machine learning models. count (y_true, y_pred) [source] # Calculate the number of data points in each group when working with MetricFrame. Using existing metric definitions from scikit-learn we can evaluate metrics for subgroups within the data as below: Note that in the Fairlearn API, fairlearn. For example, the function plot_model_comparison can b e used to create a scatter plot, where each model is represente d Aug 6, 2020 · A few lines of code can show stakeholders the impact of that kind of upstream or downstream noise on fairness metrics, and show stakeholders other ways that the technical system may be amplifying real harms. For example, functions demographic_ parity_difference and equalized_odds_difference quantify how much the predictions of a given classi er depart from the fairness criteria known as demographic parity and equal- fairlearn. Fairness Metrics. precision_score_group_min. Feb 14, 2023 · The Fairlearn Python module offers different metrics for evaluating fairness. false_positive_rate (y_true, y_pred, sample_weight = None, pos_label = None) [source] # Calculate the false positive rate (also called fall-out). In this article, we walk through examples for the following constraints: Demographic parity fairlearn. Fairlearn supports a wide range of fairness metrics for assessing a model’s impacts on different groups of people, covering both classification and regression tasks. In this case, we have results for males and females: More Complex Metrics#. fbeta_score() (with beta=0. Making Derived Metrics#. Return type: float. Note that fairlearn. Parameters fairlearn. . Evaluating fairness-related metrics# Firstly, Fairlearn provides fairness-related metrics that can be compared between groups and for the overall population. accuracy_score_difference and fairlearn. This function takes as parameters metric (a callable metric such as sklearn. May 20, 2020 · Mitigating unfairness: Alongside the metrics, and dashboard, there’s a set of algorithms in fairlearn to help mitigate unfair behavior in models. Many higher-order machine learning algorithms (such as hyperparameter tuners) make use of scalar metrics when deciding how to proceed. Contribute to Fairlearn. Metric functions often return a single scalar value based on arguments which are vectors of scalars. This method calculates a scalar value for each underlying metric by finding the maximum absolute difference between the entries in each combination of sensitive features in the by_group property. agg str. ) from fairlearn. class fairlearn. , 2014 [2]. We will review those metrics in a subsequent section of the User Guide. tree import DecisionTreeClassifier import pandas as pd from fairlearn. plot_model_comparison; fairlearn. Correlation Remover; Postprocessing; Reductions. We have many different choices for calculating confidence intervals. recall_score() or fairlearn. metrics package¶. ratio() for details. Additionally, group metrics yield the minimum and maximum metric value and for which groups these values were observed, as well as the difference and ratio between the maximum and the minimum values. In this case, we have results for males and females: Sep 22, 2020 · performance. metrics import MetricFrame. The demographic parity difference is defined as the difference between the largest and the smallest group-level selection rate, \(E[h(X) | A=a Overview of Fairlearn¶. difference() for details. sample_weight (array-like) – The sample weights. More Complex Metrics#. Created using Sphinx 4. Source: UCI Repository [1] Paper: Strack et al. g. nfsmw xdczh bkq zzmbi nony svqrbx nxieq anik syafo nhlau