classes package

Submodules

classes.entity module

class classes.entity.Entity(entity: DataFrame | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | None = None, data_cols: Sequence[T] = [0, 1], data: ndarray | None = None, static: bool = False, labels: OrderedDict[T, Sequence[T]] | None = None, uid: Hashable | None = None, weight_col: str | int | None = 'cell_weights', weights: Sequence[float] | float | int | str | None = 1, aggregateby: str | dict | None = 'sum', properties: DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', level_col: str = 'level', id_col: str = 'id')[source]

Bases: object

Base class for handling N-dimensional data when building network-like models, i.e., Hypergraph

Parameters:
  • entity (pandas.DataFrame, dict of lists or sets, list of lists or sets, optional) – If a DataFrame with N columns, represents N-dimensional entity data (data table). Otherwise, represents 2-dimensional entity data (system of sets). TODO: Test for compatibility with list of Entities and update docs

  • data (numpy.ndarray, optional) – 2D M x N ndarray of ints (data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. Ignored if entity is provided.

  • static (bool, default=True) – If True, entity data may not be altered, and the state_dict will never be cleared. Otherwise, rows may be added to and removed from the data table, and updates will clear the state_dict.

  • labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to ints in data. Ignored if entity is provided or data is not provided.

  • uid (hashable, optional) – A unique identifier for the object

  • weights (str or sequence of float, optional) –

    User-specified cell weights corresponding to entity data. If sequence of floats and entity or data defines a data table,

    length must equal the number of rows.

    If sequence of floats and entity defines a system of sets,

    length must equal the total sum of the sizes of all sets.

    If str and entity is a DataFrame,

    must be the name of a column in entity.

    Otherwise, weight for all cells is assumed to be 1.

  • aggregateby ({'sum', 'last', count', 'mean','median', max', 'min', 'first', None}) – Name of function to use for aggregating cell weights of duplicate rows when entity or data defines a data table, default is “sum”. If None, duplicate rows will be dropped without aggregating cell weights. Effectively ignored if entity defines a system of sets.

  • properties (pandas.DataFrame or doubly-nested dict, optional) – User-specified properties to be assigned to individual items in the data, i.e., cell entries in a data table; sets or set elements in a system of sets. See Notes for detailed explanation. If DataFrame, each row gives [optional item level, item label, optional named properties, {property name: property value}] (order of columns does not matter; see note for an example). If doubly-nested dict, {item level: {item label: {property name: property value}}}.

  • misc_props_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in properties; see Notes for explanation.

  • level_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in properties; see Notes for explanation.

  • id_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in properties; see Notes for explanation.

Notes

A property is a named attribute assigned to a single item in the data.

You can pass a table of properties to properties as a DataFrame:

Level (optional)

ID

[explicit property type]

[…]

misc. properties

0

level 0 item

property value

{property name: property value}

1

level 1 item

property value

{property name: property value}

N

level N item

property value

{property name: property value}

The Level column is optional. If not provided, properties will be assigned by ID (i.e., if an ID appears at multiple levels, the same properties will be assigned to all occurrences).

The names of the Level (if provided) and ID columns must be specified by level_col and id_col. misc_props_col can be used to specify the name of the column to be used for miscellaneous properties; if no column by that name is found, a new column will be created and populated with empty dicts. All other columns will be considered explicit property types. The order of the columns does not matter.

This method assumes that there are no rows with the same (Level, ID); if duplicates are found, all but the first occurrence will be dropped.

add(*args)[source]

Updates the underlying data table with new entity data from multiple sources

Parameters:

*args – variable length argument list of Entity and/or representations of entity data

Returns:

self

Return type:

Entity

Warning

Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use Hypergraph.add_edge or Hypergraph.add_node_to_edge instead.

See also

add_element

update from a single source

Hypergraph.add_edge, Hypergraph.add_node_to_edge

add_element(data)[source]

Updates the underlying data table with new entity data

Supports adding from either an existing Entity or a representation of entity (data table or labeled system of sets are both supported representations)

Parameters:

data (Entity, pandas.DataFrame, or dict of lists or sets) – new entity data

Returns:

self

Return type:

Entity

Warning

Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use Hypergraph.add_edge or Hypergraph.add_node_to_edge instead.

See also

add

takes multiple sources of new entity data as variable length argument list

Hypergraph.add_edge, Hypergraph.add_node_to_edge

add_elements_from(arg_set)[source]

Adds arguments from an iterable to the data table one at a time

..deprecated:: 2.0.0

Duplicates add

Parameters:

arg_set (iterable) – list of Entity and/or representations of entity data

Returns:

self

Return type:

Entity

assign_properties(props: DataFrame | dict[int, dict[T, dict[Any, Any]]], misc_col: str | None = None, level_col=0, id_col=1) None[source]

Assign new properties to items in the data table, update properties

Parameters:
  • props (pandas.DataFrame or doubly-nested dict) – See documentation of the properties parameter in Entity

  • level_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to _level_col, _id_col, _misc_props_col, respectively.

  • id_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to _level_col, _id_col, _misc_props_col, respectively.

  • misc_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to _level_col, _id_col, _misc_props_col, respectively.

See also

properties

property cell_weights

Cell weights corresponding to each row of the underlying data table

Returns:

dict of {tuple – Keyed by row of data table (as a tuple)

Return type:

int or float}

property children

Labels of all items in level 1 (second column) of the underlying data table

Return type:

frozenset

See also

uidset

Labels of all items in level 0 (first column)

uidset_by_level, uidset_by_column

property data

Sparse representation of the data table as an incidence tensor

This can also be thought of as an encoding of dataframe, where items in each column of the data table are translated to their int position in the self.labels[column] list :returns: 2D array of ints representing rows of the underlying data table as indices in an incidence tensor :rtype: numpy.ndarray

See also

labels, dataframe

property dataframe

The underlying data table stored by the Entity

Return type:

pandas.DataFrame

property dimensions

Dimensions of data i.e., the number of distinct items in each level (column) of the underlying data table

Returns:

Length and order corresponds to columns of self.dataframe (excluding cell weight column)

Return type:

tuple of ints

property dimsize

Number of levels (columns) in the underlying data table

Returns:

Equal to length of self.dimensions

Return type:

int

property elements

System of sets representation of the first two levels (columns) of the underlying data table

Each item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table

Returns:

System of sets representation as dict of {level 0 item : AttrList(level 1 items)}

Return type:

dict of AttrList

See also

incidence_dict

same data as dict of list

memberships

dual of this representation, i.e., each item in level 1 (second column) defines a set

elements_by_level, elements_by_column

elements_by_column(col1, col2)[source]

System of sets representation of two columns (levels) of the underlying data table

Each item in col1 defines a set containing all the col2 items with which it appears in the same row of the underlying data table

Properties can be accessed and assigned to items in col1

Parameters:
  • col1 (Hashable) – name of column whose items define sets

  • col2 (Hashable) – name of column whose items are elements in the system of sets

Returns:

System of sets representation as dict of {col1 item : AttrList(col2 items)}

Return type:

dict of AttrList

See also

elements, memberships

elements_by_level

same functionality, takes level indices instead of column names

elements_by_level(level1, level2)[source]

System of sets representation of two levels (columns) of the underlying data table

Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table

Properties can be accessed and assigned to items in level1

Parameters:
  • level1 (int) – index of level whose items define sets

  • level2 (int) – index of level whose items are elements in the system of sets

Returns:

System of sets representation as dict of {level1 item : AttrList(level2 items)}

Return type:

dict of AttrList

See also

elements, memberships

elements_by_column

same functionality, takes column names instead of level indices

property empty

Whether the underlying data table is empty or not

Return type:

bool

See also

is_empty

for checking whether a specified level (column) is empty

dimsize

0 if empty

encode(data)[source]

Encode dataframe to numpy array

Parameters:

data (dataframe) –

Return type:

numpy.array

get_properties(item: T, level: int | None = None) dict[Any, Any][source]

Get all properties of an item

Parameters:
  • item (hashable) – name of an item

  • level (int, optional) – level index of the item

Returns:

prop_vals{named property: property value, ..., misc. property column name: {property name: property value}}

Return type:

dict

Raises:

KeyError – if (level, item) is not in properties, or if level is not provided and item is not in properties

Warns:

UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)

get_property(item: T, prop_name: Any, level: int | None = None) Any[source]

Get a property of an item

Parameters:
  • item (hashable) – name of an item

  • prop_name (hashable) – name of the property to get

  • level (int, optional) – level index of the item

Returns:

prop_val – value of the property

Return type:

any

Raises:

KeyError – if (level, item) is not in properties, or if level is not provided and item is not in properties

Warns:

UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)

property incidence_dict: dict[T, list[T]]

System of sets representation of the first two levels (columns) of the underlying data table

Returns:

System of sets representation as dict of {level 0 item : AttrList(level 1 items)}

Return type:

dict of list

See also

elements

same data as dict of AttrList

incidence_matrix(level1=0, level2=1, weights=False, aggregateby=None, index=False) csr_matrix | None[source]

Incidence matrix representation for two levels (columns) of the underlying data table

If level1 and level2 contain N and M distinct items, respectively, the incidence matrix will be M x N. In other words, the items in level1 and level2 correspond to the columns and rows of the incidence matrix, respectively, in the order in which they appear in self.labels[column1] and self.labels[column2] (column1 and column2 are the column labels of level1 and level2)

Parameters:
  • level1 (int, default=0) – index of first level (column)

  • level2 (int, default=1) – index of second level

  • weights (bool or dict, default=False) – If False all nonzero entries are 1. If True all nonzero entries are filled by self.cell_weight dictionary values, use aggregateby to specify how duplicate entries should have weights aggregated. If dict of {(level1 item, level2 item): weight value} form; only nonzero cells in the incidence matrix will be updated by dictionary, i.e., level1 item and level2 item must appear in the same row at least once in the underlying data table

  • aggregateby ({'last', count', 'sum', 'mean','median', max', 'min', 'first', 'last', None}, default='count') –

    Method to aggregate weights of duplicate rows in data table.

    If None, then all cell weights will be set to 1.

  • index (bool, optional) – Not used

Returns:

sparse representation of incidence matrix (i.e. Compressed Sparse Row matrix)

Return type:

scipy.sparse.csr.csr_matrix

Note

In the context of Hypergraphs, think level1 = edges, level2 = nodes

index(column, value=None)[source]

Get level index corresponding to a column and (optionally) the index of a value in that column

The index of value is its position in the list given by self.labels[column], which is used in the integer encoding of the data table self.data

Parameters:
  • column (str) – name of a column in self.dataframe

  • value (str, optional) – label of an item in the specified column

Returns:

level index corresponding to column, index of value if provided

Return type:

int or (int, int)

See also

indices

for finding indices of multiple values in a column

level

same functionality, search for the value without specifying column

indices(column, values)[source]

Get indices of one or more value(s) in a column

Parameters:
  • column (str) –

  • values (str or iterable of str) –

Returns:

indices of values

Return type:

list of int

See also

index

for finding level index of a column and index of a single value

is_empty(level=0)[source]

Whether a specified level (column) of the underlying data table is empty or not

Return type:

bool

See also

empty

for checking whether the underlying data table is empty

size

number of items in a level (columns); 0 if level is empty

property isstatic

Whether to treat the underlying data as static or not

If True, the underlying data may not be altered, and the state_dict will never be cleared Otherwise, rows may be added to and removed from the data table, and updates will clear the state_dict

Return type:

bool

property labels

Labels of all items in each column of the underlying data table

Returns:

dict of {column name: [item labels]} The order of [item labels] corresponds to the int encoding of each item in self.data.

Return type:

dict of lists

See also

data, dataframe

level(item, min_level=0, max_level=None, return_index=True)[source]

First level containing the given item label

Order of levels corresponds to order of columns in self.dataframe

Parameters:
  • item (str) –

  • min_level (int, optional) – inclusive bounds on range of levels to search for item

  • max_level (int, optional) – inclusive bounds on range of levels to search for item

  • return_index (bool, default=True) – If True, return index of item within the level

Returns:

index of first level containing the item, index of item if return_index=True returns None if item is not found

Return type:

int, (int, int), or None

See also

index, indices

property memberships

System of sets representation of the first two levels (columns) of the underlying data table

Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table

Returns:

System of sets representation as dict of {level 1 item : AttrList(level 0 items)}

Return type:

dict of AttrList

See also

elements

dual of this representation i.e., each item in level 0 (first column) defines a set

elements_by_level, elements_by_column

property properties: DataFrame

Properties assigned to items in the underlying data table

Return type:

pandas.DataFrame

remove(*args)[source]

Removes all rows containing specified item(s) from the underlying data table

Parameters:

*args – variable length argument list of item labels

Returns:

self

Return type:

Entity

See also

remove_element

remove all rows containing a single specified item

remove_element(item)[source]

Removes all rows containing a specified item from the underlying data table

Parameters:

item – item label

Returns:

self

Return type:

Entity

See also

remove

same functionality, accepts variable length argument list of item labels

remove_elements_from(arg_set)[source]

Removes all rows containing specified item(s) from the underlying data table

..deprecated: 2.0.0

Duplicates remove

Parameters:

arg_set (iterable) – list of item labels

Returns:

self

Return type:

Entity

restrict_to_indices(indices, level=0, **kwargs)[source]

Create a new Entity by restricting the data table to rows containing specific items in a given level

Parameters:
  • indices (int or iterable of int) – indices of item label(s) in level to restrict to

  • level (int, default=0) – level index

  • **kwargs – Extra arguments to Entity constructor

Return type:

Entity

restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', **kwargs) Entity[source]

Create a new Entity by restricting to a subset of levels (columns) in the underlying data table

Parameters:
  • levels (array-like of int) – indices of a subset of levels (columns) of data

  • weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights Otherwise, all new cell weights will be 1

  • aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1

  • **kwargs – Extra arguments to Entity constructor

Return type:

Entity

Raises:

KeyError – If levels contains any invalid values

See also

EntitySet

set_property(item: T, prop_name: Any, prop_val: Any, level: int | None = None) None[source]

Set a property of an item

Parameters:
  • item (hashable) – name of an item

  • prop_name (hashable) – name of the property to set

  • prop_val (any) – value of the property to set

  • level (int, optional) – level index of the item; required if item is not already in properties

Raises:

ValueError – If level is not provided and item is not in properties

Warns:

UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)

size(level=0)[source]

The number of items in a level of the underlying data table

Equivalent to self.dimensions[level]

Parameters:

level (int, default=0) –

Return type:

int

See also

dimensions

translate(level, index)[source]

Given indices of a level and value(s), return the corresponding value label(s)

Parameters:
  • level (int) – level index

  • index (int or list of int) – value index or indices

Returns:

label(s) corresponding to value index or indices

Return type:

str or list of str

See also

translate_arr

translate a full row of value indices across all levels (columns)

translate_arr(coords)[source]

Translate a full encoded row of the data table e.g., a row of self.data

Parameters:

coords (tuple of ints) – encoded value indices, with one value index for each level of the data

Returns:

full row of translated value labels

Return type:

list of str

property uid

User-defined unique identifier for the Entity

Return type:

hashable

property uidset

Labels of all items in level 0 (first column) of the underlying data table

Return type:

frozenset

See also

children

Labels of all items in level 1 (second column)

uidset_by_level, uidset_by_column

uidset_by_column(column)[source]

Labels of all items in a particular column (level) of the underlying data table

Parameters:

column (Hashable) – Name of a column in self.dataframe

Return type:

frozenset

See also

uidset

Labels of all items in level 0 (first column)

children

Labels of all items in level 1 (second column)

uidset_by_level

Same functionality, takes the level index instead of column name

uidset_by_level(level)[source]

Labels of all items in a particular level (column) of the underlying data table

Parameters:

level (int) –

Return type:

frozenset

See also

uidset

Labels of all items in level 0 (first column)

children

Labels of all items in level 1 (second column)

uidset_by_column

Same functionality, takes the column name instead of level index

classes.entityset module

class classes.entityset.EntitySet(entity: pd.DataFrame | np.ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[T, Any]]] | None = None, data: np.ndarray | None = None, labels: OrderedDict[T, Sequence[T]] | None = None, level1: str | int = 0, level2: str | int = 1, weight_col: str | int = 'cell_weights', weights: Sequence[float] | float | int | str = 1, cell_properties: Sequence[T] | pd.DataFrame | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_cell_props_col: str = 'cell_properties', uid: Hashable | None = None, aggregateby: str | None = 'sum', properties: pd.DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', **kwargs)[source]

Bases: Entity

Class for handling 2-dimensional (i.e., system of sets, bipartite) data when building network-like models, i.e., Hypergraph

Parameters:
  • entity (Entity, pandas.DataFrame, dict of lists or sets, or list of lists or sets, optional) – If an Entity with N levels or a DataFrame with N columns, represents N-dimensional entity data (data table). If N > 2, only considers levels (columns) level1 and level2. Otherwise, represents 2-dimensional entity data (system of sets).

  • data (numpy.ndarray, optional) – 2D M x N ndarray of ints (data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. If N > 2, only considers levels (columns) level1 and level2. Ignored if entity is provided.

  • labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to ints in data. For M x N data, N > 2, labels must contain either 2 or N keys. If N keys, only considers labels for levels (columns) level1 and level2. Ignored if entity is provided or data is not provided.

  • level1 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If int, gives the index of a level; if str, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).

  • level2 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If int, gives the index of a level; if str, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).

  • weights (str or sequence of float, optional) –

    User-specified cell weights corresponding to entity data. If sequence of floats and entity or data defines a data table,

    length must equal the number of rows.

    If sequence of floats and entity defines a system of sets,

    length must equal the total sum of the sizes of all sets.

    If str and entity is a DataFrame,

    must be the name of a column in entity.

    Otherwise, weight for all cells is assumed to be 1. Ignored if entity is an Entity and `keep_weights`=True.

  • keep_weights (bool, default=True) – Whether to preserve any existing cell weights; ignored if entity is not an Entity.

  • cell_properties (str, list of str, pandas.DataFrame, or doubly-nested dict, optional) – User-specified properties to be assigned to cells of the incidence matrix, i.e., rows in a data table; pairs of (set, element of set) in a system of sets. See Notes for detailed explanation. Ignored if underlying data is 1-dimensional (set). If doubly-nested dict, {level1 item: {level2 item: {cell property name: cell property value}}}.

  • misc_cell_props_col (str, default='cell_properties') – Column name for miscellaneous cell properties; see Notes for explanation.

  • kwargs – Keyword arguments passed to the Entity constructor, e.g., static, uid, aggregateby, properties, etc. See Entity for documentation of these parameters.

Notes

A cell property is a named attribute assigned jointly to a set and one of its elements, i.e, a cell of the incidence matrix.

When an Entity or DataFrame is passed to the entity parameter of the constructor, it should represent a data table:

Column_1

Column_2

Column_3

[…]

Column_N

level 1 item

level 2 item

level 3 item

level N item

Assuming the default values for parameters level1, level2, the data table will be restricted to the set system defined by Column 1 and Column 2. Since each row of the data table represents an incidence or cell, values from other columns may contain data that should be converted to cell properties.

By passing a column name or list of column names as cell_properties, each given column will be preserved in the cell_properties as an explicit cell property type. An additional column in cell_properties will be created to store a dict of miscellaneous cell properties, which will store cell properties of types that have not been explicitly defined and do not have a dedicated column (which may be assigned after construction). The name of the miscellaneous column is determined by misc_cell_props_col.

You can also pass a pre-constructed table to cell_properties as a DataFrame:

Column_1

Column_2

[explicit cell prop. type]

[…]

misc. cell properties

level 1 item

level 2 item

cell property value

{cell property name: cell property value}

Column 1 and Column 2 must have the same names as the corresponding columns in the entity data table, and misc_cell_props_col can be used to specify the name of the column to be used for miscellaneous cell properties. If no column by that name is found, a new column will be created and populated with empty dicts. All other columns will be considered explicit cell property types. The order of the columns does not matter.

Both of these methods assume that there are no row duplicates in the tables passed to entity and/or cell_properties; if duplicates are found, all but the first occurrence will be dropped.

assign_cell_properties(cell_props: DataFrame | dict[T, dict[T, dict[Any, Any]]], misc_col: str | None = None, replace: bool = False) None[source]

Assign new properties to cells of the incidence matrix and update properties

Parameters:
  • cell_props (pandas.DataFrame, dict of iterables, or doubly-nested dict, optional) – See documentation of the cell_properties parameter in EntitySet

  • misc_col (str, optional) – name of column to be used for miscellaneous cell property dicts

  • replace (bool, default=False) – If True, replace existing cell_properties with result; otherwise update with new values from result

Raises:

AttributeError – Not supported for :attr:`dimsize`=1

property cell_properties: DataFrame | None

Properties assigned to cells of the incidence matrix

Returns:

Returns None if dimsize < 2

Return type:

pandas.Series, optional

collapse_identical_elements(return_equivalence_classes: bool = False, **kwargs) EntitySet | tuple[classes.entityset.EntitySet, dict[str, list[str]]][source]

Create a new EntitySet by collapsing sets with the same set elements

Each item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table.

Parameters:
  • return_equivalence_classes (bool, default=False) – If True, return a dictionary of equivalence classes keyed by new edge names

  • **kwargs – Extra arguments to EntitySet constructor

Returns:

  • new_entity (EntitySet) – new EntitySet with identical sets collapsed; if all sets are unique, the system of sets will be the same as the original.

  • equivalence_classes (dict of lists, optional) – if return_equivalence_classes`=True, ``{collapsed set label: [level 0 item labels]}`.

get_cell_properties(item1: T, item2: T) dict[Any, Any][source]

Get all properties of a cell, i.e., incidence between items of different levels

Parameters:
  • item1 (hashable) – name of an item in level 0

  • item2 (hashable) – name of an item in level 1

Returns:

{named cell property: cell property value, ..., misc. cell property column name: {cell property name: cell property value}}

Return type:

dict

get_cell_property(item1: T, item2: T, prop_name: Any) Any[source]

Get a property of a cell i.e., incidence between items of different levels

Parameters:
  • item1 (hashable) – name of an item in level 0

  • item2 (hashable) – name of an item in level 1

  • prop_name (hashable) – name of the cell property to get

Returns:

prop_val – value of the cell property

Return type:

any

property memberships: dict[str, hypernetx.classes.helpers.AttrList[str]]

Extends Entity.memberships

Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table.

Returns:

System of sets representation as dict of {level 1 item: AttrList(level 0 items)}.

Return type:

dict of AttrList

See also

elements

dual of this representation, i.e., each item in level 0 (first column) defines a set

restrict_to_levels

for more information on how memberships work for 1-dimensional (set) data

restrict_to(indices: int | Iterable[int], **kwargs) EntitySet[source]

Alias of restrict_to_indices() with default parameter `level`=0

Parameters:
  • indices (array_like of int) – indices of item label(s) in level to restrict to

  • **kwargs – Extra arguments to EntitySet constructor

Return type:

EntitySet

See also

restrict_to_indices

restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', keep_memberships: bool = True, **kwargs) EntitySet[source]

Extends Entity.restrict_to_levels()

Parameters:
  • levels (array-like of int) – indices of a subset of levels (columns) of data

  • weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights. Otherwise, all new cell weights will be 1.

  • aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1

  • keep_memberships (bool, default=True) – Whether to preserve membership information for the discarded level when the new EntitySet is restricted to a single level

  • **kwargs – Extra arguments to EntitySet constructor

Return type:

EntitySet

Raises:

KeyError – If levels contains any invalid values

set_cell_property(item1: T, item2: T, prop_name: Any, prop_val: Any) None[source]

Set a property of a cell i.e., incidence between items of different levels

Parameters:
  • item1 (hashable) – name of an item in level 0

  • item2 (hashable) – name of an item in level 1

  • prop_name (hashable) – name of the cell property to set

  • prop_val (any) – value of the cell property to set

classes.helpers module

class classes.helpers.AttrList(entity: Entity, key: tuple[int, str | int], initlist: list | None = None)[source]

Bases: UserList

Custom list wrapper for integrated property storage in Entity

Parameters:
  • entity (hypernetx.Entity) –

  • key (tuple of (int, str or int)) – (level, item)

  • initlist (list, optional) – list of elements, passed to UserList constructor

classes.helpers.assign_weights(df, weights=1, weight_col='cell_weights')[source]
Parameters:
  • df (pandas.DataFrame) – A DataFrame to assign a weight column to

  • weights (array-like or Hashable, optional) – If numpy.ndarray with the same length as df, create a new weight column with these values. If Hashable, must be the name of a column of df to assign as the weight column Otherwise, create a new weight column assigning a weight of 1 to every row

  • weight_col (Hashable) – Name for new column if one is created (not used if the name of an existing column is passed as weights)

Returns:

  • df (pandas.DataFrame) – The original DataFrame with a new column added if needed

  • weight_col (str) – Name of the column assigned to hold weights

Note

TODO: move logic for default weights inside this method

classes.helpers.create_properties(props: DataFrame | dict[str | int, collections.abc.Iterable[str | int]] | dict[str | int, dict[str | int, dict[Any, Any]]] | None, index_cols: list[str], misc_col: str) DataFrame[source]

Helper function for initializing properties and cell properties

Parameters:
  • props (pandas.DataFrame, dict of iterables, doubly-nested dict, or None) – See documentation of the properties parameter in Entity, cell_properties parameter in EntitySet

  • index_cols (list of str) – names of columns to be used as levels of the MultiIndex

  • misc_col (str) – name of column to be used for miscellaneous property dicts

Returns:

with MultiIndex on index_cols; each entry of the miscellaneous column holds dict of {property name: property value}

Return type:

pandas.DataFrame

classes.helpers.dict_depth(dic, level=0)[source]
classes.helpers.encode(data: DataFrame)[source]

Encode dataframe to numpy array

Parameters:

data (dataframe) –

Return type:

numpy.array

classes.helpers.merge_nested_dicts(a, b, path=None)[source]

merges b into a

classes.helpers.remove_row_duplicates(df, data_cols, weights=1, weight_col='cell_weights', aggregateby=None)[source]

Removes and aggregates duplicate rows of a DataFrame using groupby

Parameters:
  • df (pandas.DataFrame) – A DataFrame to remove or aggregate duplicate rows from

  • data_cols (list) – A list of column names in df to perform the groupby on / remove duplicates from

  • weights (array-like or Hashable, optional) – Argument passed to assign_weights

  • aggregateby (str, optional, default='sum') – A valid aggregation method for pandas groupby If None, drop duplicates without aggregating weights

Returns:

  • df (pandas.DataFrame) – The DataFrame with duplicate rows removed or aggregated

  • weight_col (Hashable) – The name of the column holding aggregated weights, or None if aggregateby=None

classes.hypergraph module

class classes.hypergraph.Hypergraph(setsystem: DataFrame | ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, edge_col: str | int = 0, node_col: str | int = 1, cell_weight_col: str | int | None = 'cell_weights', cell_weights: Sequence[float] | float = 1.0, cell_properties: Sequence[str | int] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, misc_cell_properties_col: str | int | None = None, aggregateby: str | dict[str, str] = 'first', edge_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, node_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, properties: DataFrame | dict[T, dict[Any, Any]] | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_properties_col: str | int | None = None, edge_weight_prop_col: str | int = 'weight', node_weight_prop_col: str | int = 'weight', weight_prop_col: str | int = 'weight', default_edge_weight: float | None = None, default_node_weight: float | None = None, default_weight: float = 1.0, name: str | None = None, **kwargs)[source]

Bases: object

Parameters:
  • setsystem ((optional) dict of iterables, dict of dicts,iterable of iterables,) – pandas.DataFrame, numpy.ndarray, default = None See SetSystem above for additional setsystem requirements.

  • edge_col ((optional) str | int, default = 0) – column index (or name) in pandas.dataframe or numpy.ndarray, used for (hyper)edge ids. Will be used to reference edgeids for all set systems.

  • node_col ((optional) str | int, default = 1) – column index (or name) in pandas.dataframe or numpy.ndarray, used for node ids. Will be used to reference nodeids for all set systems.

  • cell_weight_col ((optional) str | int, default = None) – column index (or name) in pandas.dataframe or numpy.ndarray used for referencing cell weights. For a dict of dicts references key in cell property dicts.

  • cell_weights ((optional) Sequence[float,int] | int | float , default = 1.0) – User specified cell_weights or default cell weight. Sequential values are only used if setsystem is a dataframe or ndarray in which case the sequence must have the same length and order as these objects. Sequential values are ignored for dataframes if cell_weight_col is already a column in the data frame. If cell_weights is assigned a single value then it will be used as default for missing values or when no cell_weight_col is given.

  • cell_properties ((optional) Sequence[int | str] | Mapping[T,Mapping[T,Mapping[str,Any]]],) – default = None Column names from pd.DataFrame to use as cell properties or a dict assigning cell_property to incidence pairs of edges and nodes. Will generate a misc_cell_properties, which may have variable lengths per cell.

  • misc_cell_properties ((optional) str | int, default = None) – Column name of dataframe corresponding to a column of variable length property dictionaries for the cell. Ignored for other setsystem types.

  • aggregateby ((optional) str, dict, default = 'first') – By default duplicate edge,node incidences will be dropped unless specified with aggregateby. See pandas.DataFrame.agg() methods for additional syntax and usage information.

  • edge_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with edge ids. First column of dataframe or keys of dict link to edge ids in setsystem.

  • node_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with node ids. First column of dataframe or keys of dict link to node ids in setsystem.

  • properties ((optional) pd.DataFrame | dict, default = None) – Concatenation/union of edge_properties and node_properties. By default, the object id is used and should be the first column of the dataframe, or key in the dict. If there are nodes and edges with the same ids and different properties then use the edge_properties and node_properties keywords.

  • misc_properties ((optional) int | str, default = None) – Column of property dataframes with dtype=dict. Intended for variable length property dictionaries for the objects.

  • edge_weight_prop ((optional) str, default = None,) – Name of property in edge_properties to use for weight.

  • node_weight_prop ((optional) str, default = None,) – Name of property in node_properties to use for weight.

  • weight_prop ((optional) str, default = None) – Name of property in properties to use for ‘weight’

  • default_edge_weight ((optional) int | float, default = 1) – Used when edge weight property is missing or undefined.

  • default_node_weight ((optional) int | float, default = 1) – Used when node weight property is missing or undefined

  • name ((optional) str, default = None) – Name assigned to hypergraph

Hypergraphs in HNX 2.0

An hnx.Hypergraph H = (V,E) references a pair of disjoint sets: V = nodes (vertices) and E = (hyper)edges.

HNX allows for multi-edges by distinguishing edges by their identifiers instead of their contents. For example, if V = {1,2,3} and E = {e1,e2,e3}, where e1 = {1,2}, e2 = {1,2}, and e3 = {1,2,3}, the edges e1 and e2 contain the same set of nodes and yet are distinct and are distinguishable within H = (V,E).

New as of version 2.0, HNX provides methods to easily store and access additional metadata such as cell, edge, and node weights. Metadata associated with (edge,node) incidences are referenced as cell_properties. Metadata associated with a single edge or node is referenced as its properties.

The fundamental object needed to create a hypergraph is a setsystem. The setsystem defines the many-to-many relationships between edges and nodes in the hypergraph. Cell properties for the incidence pairs can be defined within the setsystem or in a separate pandas.Dataframe or dict. Edge and node properties are defined with a pandas.DataFrame or dict.

SetSystems

There are five types of setsystems currently accepted by the library.

  1. iterable of iterables : Barebones hypergraph uses Pandas default indexing to generate hyperedge ids. Elements must be hashable.:

    >>> H = Hypergraph([{1,2},{1,2},{1,2,3}])
    
  2. dictionary of iterables : the most basic way to express many-to-many relationships providing edge ids. The elements of the iterables must be hashable):

    >>> H = Hypergraph({'e1':[1,2],'e2':[1,2],'e3':[1,2,3]})
    
  3. dictionary of dictionaries : allows cell properties to be assigned to a specific (edge, node) incidence. This is particularly useful when there are variable length dictionaries assigned to each pair:

    >>> d = {'e1':{ 1: {'w':0.5, 'name': 'related_to'},
    >>>             2: {'w':0.1, 'name': 'related_to',
    >>>                 'startdate': '05.13.2020'}},
    >>>      'e2':{ 1: {'w':0.52, 'name': 'owned_by'},
    >>>             2: {'w':0.2}},
    >>>      'e3':{ 1: {'w':0.5, 'name': 'related_to'},
    >>>             2: {'w':0.2, 'name': 'owner_of'},
    >>>             3: {'w':1, 'type': 'relationship'}}
    
    >>> H = Hypergraph(d, cell_weight_col='w')
    
  4. pandas.DataFrame For large datasets and for datasets with cell properties it is most efficient to construct a hypergraph directly from a pandas.DataFrame. Incidence pairs are in the first two columns. Cell properties shared by all incidence pairs can be placed in their own column of the dataframe. Variable length dictionaries of cell properties particular to only some of the incidence pairs may be placed in a single column of the dataframe. Representing the data above as a dataframe df:

    col1

    col2

    w

    col3

    e1

    1

    0.5

    {‘name’:’related_to’}

    e1

    2

    0.1

    {“name”:”related_to”,

    “startdate”:”05.13.2020”}

    e2

    1

    0.52

    {“name”:”owned_by”}

    e2

    2

    0.2

    {…}

    The first row of the dataframe is used to reference each column.

    >>> H = Hypergraph(df,edge_col="col1",node_col="col2",
    >>>                 cell_weight_col="w",misc_cell_properties="col3")
    
  5. numpy.ndarray For homogeneous datasets given in an ndarray a pandas dataframe is generated and column names are added from the edge_col and node_col arguments. Cell properties containing multiple data types are added with a separate dataframe or dict and passed through the cell_properties keyword.

    >>> arr = np.array([['e1','1'],['e1','2'],
    >>>                 ['e2','1'],['e2','2'],
    >>>                 ['e3','1'],['e3','2'],['e3','3']])
    >>> H = hnx.Hypergraph(arr, column_names=['col1','col2'])
    

Edge and Node Properties

Properties specific to a single edge or node are passed through the keywords: edge_properties, node_properties, properties. Properties may be passed as dataframes or dicts. The first column or index of the dataframe or keys of the dict keys correspond to the edge and/or node identifiers. If identifiers are shared among edges and nodes, or are distinct for edges and nodes, properties may be combined into a single object and passed to the properties keyword. For example:

id

weight

properties

e1

5.0

{‘type’:’event’}

e2

0.52

{“name”:”owned_by”}

{…}

1

1.2

{‘color’:’red’}

2

.003

{‘name’:’Fido’,’color’:’brown’}

3

1.0

{}

A properties dictionary should have the format:

dp = {id1 : {prop1:val1, prop2,val2,...}, id2 : ... }

A properties dataframe may be used for nodes and edges sharing ids but differing in cell properties by adding a level index using 0 for edges and 1 for nodes:

level

id

weight

properties

0

e1

5.0

{‘type’:’event’}

0

e2

0.52

{“name”:”owned_by”}

{…}

1

1.2

{‘color’:’red’}

2

.003

{‘name’:’Fido’,’color’:’brown’}

{…}

Weights

The default key for cell and object weights is “weight”. The default value is 1. Weights may be assigned and/or a new default prescribed in the constructor using cell_weight_col and cell_weights for incidence pairs, and using edge_weight_prop, node_weight_prop, weight_prop, default_edge_weight, and default_node_weight for node and edge weights.

adjacency_matrix(s=1, index=False, remove_empty_rows=False)[source]

The s-adjacency matrix for the hypergraph.

Parameters:
  • s (int, optional, default = 1) –

  • index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns

  • remove_empty_rows (boolean, optional, default = False) –

Returns:

  • adjacency_matrix (scipy.sparse.csr.csr_matrix)

  • node_index (list) – index of ids for rows and columns

auxiliary_matrix(s=1, node=True, index=False)[source]

The unweighted s-edge or node auxiliary matrix for hypergraph

Parameters:
  • s (int, optional, default = 1) –

  • node (bool, optional, default = True) – whether to return based on node or edge adjacencies

Returns:

  • auxiliary_matrix (scipy.sparse.csr.csr_matrix) – Node/Edge adjacency matrix with empty rows and columns removed

  • index (np.array) – row and column index of userids

bipartite()[source]

Constructs the networkX bipartite graph associated to hypergraph.

Returns:

bipartite

Return type:

nx.Graph()

Notes

Creates a bipartite networkx graph from hypergraph. The nodes and (hyper)edges of hypergraph become the nodes of bipartite graph. For every (hyper)edge e in the hypergraph and node n in e there is an edge (n,e) in the graph.

collapse_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]

Constructs a new hypergraph gotten by identifying edges containing the same nodes

Parameters:
  • name (hashable, optional, default = None) –

  • return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes

Returns:

  • new hypergraph (Hypergraph) – Equivalent edges are collapsed to a single edge named by a representative of the equivalent edges followed by a colon and the number of edges it represents.

  • equivalence_classes (dict) – A dictionary keyed by representative edge names with values equal to the edges in its equivalence class

Notes

Two edges are identified if their respective elements are the same. Using this as an equivalence relation, the uids of the edges are partitioned into equivalence classes.

A single edge from the collapsed edges followed by a colon and the number of elements in its equivalence class as uid for the new edge

collapse_nodes(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None) Hypergraph[source]

Constructs a new hypergraph gotten by identifying nodes contained by the same edges

Parameters:
  • name (str, optional, default = None) –

  • return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of node equivalence classes keyed by frozen sets of edges

  • use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed nodes as uid for the new node, otherwise uses a frozen set of the uids of nodes in the equivalence class. If use_reps is True the new nodes have uids given by a tuple of the rep and the count

  • return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]

Returns:

new hypergraph

Return type:

Hypergraph

Notes

Two nodes are identified if their respective memberships are the same. Using this as an equivalence relation, the uids of the nodes are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.

Example

>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')}))
>>> h = Hypergraph(data)
>>> h.collapse_nodes().incidence_dict
{'E1': ['a: 2'], 'E2': ['a: 2']}
collapse_nodes_and_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]

Returns a new hypergraph by collapsing nodes and edges.

Parameters:
  • name (str, optional, default = None) –

  • return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes

  • use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed elements as a representative. If use_reps is True, the new elements are keyed by a tuple of the rep and the count.

  • return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]

Returns:

new hypergraph

Return type:

Hypergraph

Notes

Collapses the Nodes and Edges of EntitySets. Two nodes(edges) are duplicates if their respective memberships(elements) are the same. Using this as an equivalence relation, the uids of the nodes(edges) are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.

Example

>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')}
>>> h = Hypergraph(data)
>>> h.incidence_dict
{'E1': ['a', 'b'], 'E2': ['a', 'b']}
>>> h.collapse_nodes_and_edges().incidence_dict
{'E1: 2': ['a: 2']}
component_subgraphs(return_singletons=False, name=None)[source]

Same as s_components_subgraphs() with s=1. Returns iterator.

components(edges=False)[source]

Same as s_connected_components() with s=1, but nodes are returned by default. Return iterator.

connected_component_subgraphs(return_singletons=True, name=None)[source]

Same as s_component_subgraphs() with s=1. Returns iterator

connected_components(edges=False)[source]

Same as s_connected_components() with s=1, but nodes are returned by default. Return iterator.

property dataframe

Returns dataframe of incidence pairs and their properties.

Return type:

pd.DataFrame

degree(node, s=1, max_size=None)[source]

The number of edges of size s that contain node.

Parameters:
  • node (hashable) – identifier for the node.

  • s (positive integer, optional, default 1) – smallest size of edge to consider in degree

  • max_size (positive integer or None, optional, default = None) – largest size of edge to consider in degree

Return type:

int

diameter(s=1)[source]

Returns the length of the longest shortest s-walk between nodes in hypergraph

Parameters:

s (int, optional, default 1) –

Returns:

diameter

Return type:

int

Raises:

HyperNetXError – If hypergraph is not s-edge-connected

Notes

Two nodes are s-adjacent if they share s edges. Two nodes v_start and v_end are s-walk connected if there is a sequence of nodes v_start, v_1, v_2, … v_n-1, v_end such that consecutive nodes are s-adjacent. If the graph is not connected, an error will be raised.

dim(edge)[source]

Same as size(edge)-1.

distance(source, target, s=1)[source]

Returns the shortest s-walk distance between two nodes in the hypergraph.

Parameters:
  • source (node.uid or node) – a node in the hypergraph

  • target (node.uid or node) – a node in the hypergraph

  • s (positive integer) – the number of edges

Returns:

s-walk distance

Return type:

int

See also

edge_distance

Notes

The s-distance is the shortest s-walk length between the nodes. An s-walk between nodes is a sequence of nodes that pairwise share at least s edges. The length of the shortest s-walk is 1 less than the number of nodes in the path sequence.

Uses the networkx shortest_path_length method on the graph generated by the s-adjacency matrix.

dual(name=None, switch_names=True)[source]

Constructs a new hypergraph with roles of edges and nodes of hypergraph reversed.

Parameters:
  • name (hashable, optional) –

  • switch_names (bool, optional, default = True) – reverses edge_col and node_col names unless edge_col = ‘edges’ and node_col = ‘nodes’

Return type:

hypergraph

edge_adjacency_matrix(s=1, index=False)[source]

The s-adjacency matrix for the dual hypergraph.

Parameters:
  • s (int, optional, default 1) –

  • index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns

Returns:

  • edge_adjacency_matrix (scipy.sparse.csr.csr_matrix)

  • edge_index (list) – index of ids for rows and columns

Notes

This is also the adjacency matrix for the line graph. Two edges are s-adjacent if they share at least s nodes. If remove_zeros is True will return the auxillary matrix

edge_diameter(s=1)[source]

Returns the length of the longest shortest s-walk between edges in hypergraph

Parameters:

s (int, optional, default 1) –

Returns:

edge_diameter

Return type:

int

Raises:

HyperNetXError – If hypergraph is not s-edge-connected

Notes

Two edges are s-adjacent if they share s nodes. Two nodes e_start and e_end are s-walk connected if there is a sequence of edges e_start, e_1, e_2, … e_n-1, e_end such that consecutive edges are s-adjacent. If the graph is not connected, an error will be raised.

edge_diameters(s=1)[source]

Returns the edge diameters of the s_edge_connected component subgraphs in hypergraph.

Parameters:

s (int, optional, default 1) –

Returns:

  • maximum diameter (int)

  • list of diameters (list) – List of edge_diameters for s-edge component subgraphs in hypergraph

  • list of component (list) – List of the edge uids in the s-edge component subgraphs.

edge_distance(source, target, s=1)[source]

XX TODO: still need to return path and translate into user defined nodes and edges Returns the shortest s-walk distance between two edges in the hypergraph.

Parameters:
  • source (edge.uid or edge) – an edge in the hypergraph

  • target (edge.uid or edge) – an edge in the hypergraph

  • s (positive integer) – the number of intersections between pairwise consecutive edges

  • TODO (add edge weights) –

  • weight (None or string, optional, default = None) – if None then all edges have weight 1. If string then edge attribute string is used if available.

Returns:

s- walk distance – A shortest s-walk is computed as a sequence of edges, the s-walk distance is the number of edges in the sequence minus 1. If no such path exists returns np.inf.

Return type:

the shortest s-walk edge distance

See also

distance

Notes

The s-distance is the shortest s-walk length between the edges. An s-walk between edges is a sequence of edges such that consecutive pairwise edges intersect in at least s nodes. The length of the shortest s-walk is 1 less than the number of edges in the path sequence.

Uses the networkx shortest_path_length method on the graph generated by the s-edge_adjacency matrix.

edge_neighbors(edge, s=1)[source]

The edges in hypergraph which share s nodes(s) with edge.

Parameters:
  • edge (hashable or Entity) – uid for a edge in hypergraph or the edge Entity

  • s (int, list, optional, default = 1) – Minimum number of nodes shared by neighbors edge node.

Returns:

List of edge neighbors

Return type:

list

property edge_props

Dataframe of edge properties indexed on edge ids

Return type:

pd.DataFrame

edge_size_dist()[source]

Returns the size for each edge

Return type:

np.array

property edges

Object associated with self._edges.

Return type:

EntitySet

classmethod from_bipartite(B, set_names=('edges', 'nodes'), name=None, **kwargs)[source]

Static method creates a Hypergraph from a bipartite graph.

Parameters:
  • B (nx.Graph()) – A networkx bipartite graph. Each node in the graph has a property ‘bipartite’ taking the value of 0 or 1 indicating a 2-coloring of the graph.

  • set_names (iterable of length 2, optional, default = ['edges','nodes']) – Category names assigned to the graph nodes associated to each bipartite set

  • name (hashable, optional) –

Return type:

Hypergraph

Notes

A partition for the nodes in a bipartite graph generates a hypergraph.

>>> import networkx as nx
>>> B = nx.Graph()
>>> B.add_nodes_from([1, 2, 3, 4], bipartite=0)
>>> B.add_nodes_from(['a', 'b', 'c'], bipartite=1)
>>> B.add_edges_from([(1, 'a'), (1, 'b'), (2, 'b'), (2, 'c'), /
    (3, 'c'), (4, 'a')])
>>> H = Hypergraph.from_bipartite(B)
>>> H.nodes, H.edges
# output: (EntitySet(_:Nodes,[1, 2, 3, 4],{}), /
# EntitySet(_:Edges,['b', 'c', 'a'],{}))
classmethod from_incidence_dataframe(df, columns=None, rows=None, edge_col: str = 'edges', node_col: str = 'nodes', name=None, fillna=0, transpose=False, transforms=[], key=None, return_only_dataframe=False, **kwargs)[source]

Create a hypergraph from a Pandas Dataframe object, which has values equal to the incidence matrix of a hypergraph. Its index will identify the nodes and its columns will identify its edges.

Parameters:
  • df (Pandas.Dataframe) – a real valued dataframe with a single index

  • columns ((optional) list, default = None) – restricts df to the columns with headers in this list.

  • rows ((optional) list, default = None) – restricts df to the rows indexed by the elements in this list.

  • name ((optional) string, default = None) –

  • fillna (float, default = 0) – a real value to place in empty cell, all-zero columns will not generate an edge.

  • transpose ((optional) bool, default = False) – option to transpose the dataframe, in this case df.Index will identify the edges and df.columns will identify the nodes, transpose is applied before transforms and key

  • transforms ((optional) list, default = []) – optional list of transformations to apply to each column, of the dataframe using pd.DataFrame.apply(). Transformations are applied in the order they are given (ex. abs). To apply transforms to rows or for additional functionality, consider transforming df using pandas.DataFrame methods prior to generating the hypergraph.

  • key ((optional) function, default = None) – boolean function to be applied to dataframe. will be applied to entire dataframe.

  • return_only_dataframe ((optional) bool, default = False) – to use the incidence_dataframe with cell_properties or properties, set this to true and use it as the setsystem in the Hypergraph constructor.

See also

from_numpy_array

Return type:

Hypergraph

classmethod from_incidence_matrix(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]

Same as from_numpy_array.

classmethod from_numpy_array(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]

Create a hypergraph from a real valued matrix represented as a 2 dimensionsl numpy array. The matrix is converted to a matrix of 0’s and 1’s so that any truthy cells are converted to 1’s and all others to 0’s.

Parameters:
  • M (real valued array-like object, 2 dimensions) – representing a real valued matrix with rows corresponding to nodes and columns to edges

  • node_names (object, array-like, default=None) – List of node names must be the same length as M.shape[0]. If None then the node names correspond to row indices with ‘v’ prepended.

  • edge_names (object, array-like, default=None) – List of edge names must have the same length as M.shape[1]. If None then the edge names correspond to column indices with ‘e’ prepended.

  • name (hashable) –

  • key ((optional) function) – boolean function to be evaluated on each cell of the array, must be applicable to numpy.array

Return type:

Hypergraph

Note

The constructor does not generate empty edges. All zero columns in M are removed and the names corresponding to these edges are discarded.

get_cell_properties(edge: str, node: str, prop_name: str | None = None) Any | dict[str, Any][source]

Get cell properties on a specified edge and node

Parameters:
  • edge (str) – edgeid

  • node (str) – nodeid

  • prop_name (str, optional) – name of a cell property; if None, all cell properties will be returned

Returns:

cell property value if prop_name is provided, otherwise dict of all cell properties and values

Return type:

int or str or dict of {str: any}

get_linegraph(s=1, edges=True)[source]

Creates an ::term::s-linegraph for the Hypergraph. If edges=True (default)then the edges will be the vertices of the line graph. Two vertices are connected by an s-line-graph edge if the corresponding hypergraph edges intersect in at least s hypergraph nodes. If edges=False, the hypergraph nodes will be the vertices of the line graph. Two vertices are connected if the nodes they correspond to share at least s incident hyper edges.

Parameters:
  • s (int) – The width of the connections.

  • edges (bool, optional, default = True) – Determine if edges or nodes will be the vertices in the linegraph.

Returns:

A NetworkX graph.

Return type:

nx.Graph

get_properties(id, level=None, prop_name=None)[source]

Returns an object’s specific property or all properties

Parameters:
  • id (hashable) – edge or node id

  • level (int | None , optional, default = None) – if separate edge and node properties then enter 0 for edges and 1 for nodes.

  • prop_name (str | None, optional, default = None) – if None then all properties associated with the object will be returned.

Returns:

single property or dictionary of properties

Return type:

str or dict

incidence_dataframe(sort_rows=False, sort_columns=False, cell_weights=True)[source]

Returns a pandas dataframe for hypergraph indexed by the nodes and with column headers given by the edge names.

Parameters:
  • sort_rows (bool, optional, default =True) – sort rows based on hashable node names

  • sort_columns (bool, optional, default =True) – sort columns based on hashable edge names

  • cell_weights (bool, optional, default =True) –

property incidence_dict

Dictionary keyed by edge uids with values the uids of nodes in each edge

Return type:

dict

incidence_matrix(weights=False, index=False)[source]

An incidence matrix for the hypergraph indexed by nodes x edges.

Parameters:
  • weights (bool, default =False) – If False all nonzero entries are 1. If True and self.static all nonzero entries are filled by self.edges.cell_weights dictionary values.

  • index (boolean, optional, default = False) – If True return will include a dictionary of node uid : row number and edge uid : column number

Returns:

  • incidence_matrix (scipy.sparse.csr.csr_matrix or np.ndarray)

  • row_index (list) – index of node ids for rows

  • col_index (list) – index of edge ids for columns

is_connected(s=1, edges=False)[source]

Determines if hypergraph is s-connected.

Parameters:
  • s (int, optional, default 1) –

  • edges (boolean, optional, default = False) – If True, will determine if s-edge-connected. For s=1 s-edge-connected is the same as s-connected.

Returns:

is_connected

Return type:

boolean

Notes

A hypergraph is s node connected if for any two nodes v0,vn there exists a sequence of nodes v0,v1,v2,…,v(n-1),vn such that every consecutive pair of nodes v(i),v(i+1) share at least s edges.

A hypergraph is s edge connected if for any two edges e0,en there exists a sequence of edges e0,e1,e2,…,e(n-1),en such that every consecutive pair of edges e(i),e(i+1) share at least s nodes.

neighbors(node, s=1)[source]

The nodes in hypergraph which share s edge(s) with node.

Parameters:
  • node (hashable or Entity) – uid for a node in hypergraph or the node Entity

  • s (int, list, optional, default = 1) – Minimum number of edges shared by neighbors with node.

Returns:

neighbors – s-neighbors share at least s edges in the hypergraph

Return type:

list

node_diameters(s=1)[source]

Returns the node diameters of the connected components in hypergraph.

Parameters:
  • and (list of the diameters of the s-components) –

  • nodes (list of the s-component) –

property node_props

Dataframe of node properties indexed on node ids

Return type:

pd.DataFrame

property nodes

Object associated with self._nodes.

Return type:

EntitySet

number_of_edges(edgeset=None)[source]

The number of edges in edgeset belonging to hypergraph.

Parameters:

edgeset (an iterable of Entities, optional, default = None) – If None, then return the number of edges in hypergraph.

Returns:

number_of_edges

Return type:

int

number_of_nodes(nodeset=None)[source]

The number of nodes in nodeset belonging to hypergraph.

Parameters:

nodeset (an interable of Entities, optional, default = None) – If None, then return the number of nodes in hypergraph.

Returns:

number_of_nodes

Return type:

int

order()[source]

The number of nodes in hypergraph.

Returns:

order

Return type:

int

property properties

Returns dataframe of edge and node properties.

Return type:

pd.DataFrame

remove(keys, level=None, name=None)[source]

Creates a new hypergraph with nodes and/or edges indexed by keys removed. More efficient for creating a restricted hypergraph if the restricted set is greater than what is being removed.

Parameters:
  • keys (list | tuple | set | Hashable) – node and/or edge id(s) to restrict to

  • level (None, optional) – Enter 0 to remove edges with ids in keys. Enter 1 to remove nodes with ids in keys. If None then all objects in nodes and edges with the id will be removed.

  • name (str, optional) – Name of new hypergraph

Return type:

hnx.Hypergraph

remove_edges(keys, name=None)[source]
remove_nodes(keys, name=None)[source]
remove_singletons(name=None)[source]

Constructs clone of hypergraph with singleton edges removed.

Returns:

new hypergraph

Return type:

Hypergraph

restrict_to_edges(edges, name=None)[source]

New hypergraph gotten by restricting to edges

Parameters:

edges (Iterable) – edgeids to restrict to

Return type:

hnx.Hypergraph

restrict_to_nodes(nodes, name=None)[source]

New hypergraph gotten by restricting to nodes

Parameters:

nodes (Iterable) – nodeids to restrict to

Return type:

hnx. Hypergraph

s_component_subgraphs(s=1, edges=True, return_singletons=False, name=None)[source]

Returns a generator for the induced subgraphs of s_connected components. Removes singletons unless return_singletons is set to True. Computed using s-linegraph generated either by the hypergraph (edges=True) or its dual (edges = False)

Parameters:
  • s (int, optional, default 1) –

  • edges (boolean, optional, edges=False) – Determines if edge or node components are desired. Returns subgraphs equal to the hypergraph restricted to each set of nodes(edges) in the s-connected components or s-edge-connected components

  • return_singletons (bool, optional) –

Yields:

s_component_subgraphs (iterator) – Iterator returns subgraphs generated by the edges (or nodes) in the s-edge(node) components of hypergraph.

s_components(s=1, edges=True, return_singletons=True)[source]

Same as s_connected_components

s_connected_components(s=1, edges=True, return_singletons=False)[source]

Returns a generator for the s-edge-connected components or the s-node-connected components of the hypergraph.

Parameters:
  • s (int, optional, default 1) –

  • edges (boolean, optional, default = True) – If True will return edge components, if False will return node components

  • return_singletons (bool, optional, default = False) –

Notes

If edges=True, this method returns the s-edge-connected components as lists of lists of edge uids. An s-edge-component has the property that for any two edges e1 and e2 there is a sequence of edges starting with e1 and ending with e2 such that pairwise adjacent edges in the sequence intersect in at least s nodes. If s=1 these are the path components of the hypergraph.

If edges=False this method returns s-node-connected components. A list of sets of uids of the nodes which are s-walk connected. Two nodes v1 and v2 are s-walk-connected if there is a sequence of nodes starting with v1 and ending with v2 such that pairwise adjacent nodes in the sequence share s edges. If s=1 these are the path components of the hypergraph.

Example

>>> S = {'A':{1,2,3},'B':{2,3,4},'C':{5,6},'D':{6}}
>>> H = Hypergraph(S)
>>> list(H.s_components(edges=True))
[{'C', 'D'}, {'A', 'B'}]
>>> list(H.s_components(edges=False))
[{1, 2, 3, 4}, {5, 6}]
Yields:

s_connected_components (iterator) – Iterator returns sets of uids of the edges (or nodes) in the s-edge(node) components of hypergraph.

set_state(**kwargs)[source]

Allow state_dict updates from outside of class. Use with caution.

Parameters:

**kwargs – key=value pairs to save in state dictionary

property shape

(number of nodes, number of edges)

Return type:

tuple

singletons()[source]

Returns a list of singleton edges. A singleton edge is an edge of size 1 with a node of degree 1.

Returns:

singles – A list of edge uids.

Return type:

list

size(edge, nodeset=None)[source]

The number of nodes in nodeset that belong to edge. If nodeset is None then returns the size of edge

Parameters:

edge (hashable) – The uid of an edge in the hypergraph

Returns:

size

Return type:

int

toplexes(name=None)[source]

Returns a simple hypergraph corresponding to self.

Warning

Collapsing is no longer supported inside the toplexes method. Instead generate a new collapsed hypergraph and compute the toplexes of the new hypergraph.

Parameters:

name (str, optional, default = None) –

Module contents

class classes.Entity(entity: DataFrame | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | None = None, data_cols: Sequence[T] = [0, 1], data: ndarray | None = None, static: bool = False, labels: OrderedDict[T, Sequence[T]] | None = None, uid: Hashable | None = None, weight_col: str | int | None = 'cell_weights', weights: Sequence[float] | float | int | str | None = 1, aggregateby: str | dict | None = 'sum', properties: DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', level_col: str = 'level', id_col: str = 'id')[source]

Bases: object

Base class for handling N-dimensional data when building network-like models, i.e., Hypergraph

Parameters:
  • entity (pandas.DataFrame, dict of lists or sets, list of lists or sets, optional) – If a DataFrame with N columns, represents N-dimensional entity data (data table). Otherwise, represents 2-dimensional entity data (system of sets). TODO: Test for compatibility with list of Entities and update docs

  • data (numpy.ndarray, optional) – 2D M x N ndarray of ints (data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. Ignored if entity is provided.

  • static (bool, default=True) – If True, entity data may not be altered, and the state_dict will never be cleared. Otherwise, rows may be added to and removed from the data table, and updates will clear the state_dict.

  • labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to ints in data. Ignored if entity is provided or data is not provided.

  • uid (hashable, optional) – A unique identifier for the object

  • weights (str or sequence of float, optional) –

    User-specified cell weights corresponding to entity data. If sequence of floats and entity or data defines a data table,

    length must equal the number of rows.

    If sequence of floats and entity defines a system of sets,

    length must equal the total sum of the sizes of all sets.

    If str and entity is a DataFrame,

    must be the name of a column in entity.

    Otherwise, weight for all cells is assumed to be 1.

  • aggregateby ({'sum', 'last', count', 'mean','median', max', 'min', 'first', None}) – Name of function to use for aggregating cell weights of duplicate rows when entity or data defines a data table, default is “sum”. If None, duplicate rows will be dropped without aggregating cell weights. Effectively ignored if entity defines a system of sets.

  • properties (pandas.DataFrame or doubly-nested dict, optional) – User-specified properties to be assigned to individual items in the data, i.e., cell entries in a data table; sets or set elements in a system of sets. See Notes for detailed explanation. If DataFrame, each row gives [optional item level, item label, optional named properties, {property name: property value}] (order of columns does not matter; see note for an example). If doubly-nested dict, {item level: {item label: {property name: property value}}}.

  • misc_props_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in properties; see Notes for explanation.

  • level_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in properties; see Notes for explanation.

  • id_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in properties; see Notes for explanation.

Notes

A property is a named attribute assigned to a single item in the data.

You can pass a table of properties to properties as a DataFrame:

Level (optional)

ID

[explicit property type]

[…]

misc. properties

0

level 0 item

property value

{property name: property value}

1

level 1 item

property value

{property name: property value}

N

level N item

property value

{property name: property value}

The Level column is optional. If not provided, properties will be assigned by ID (i.e., if an ID appears at multiple levels, the same properties will be assigned to all occurrences).

The names of the Level (if provided) and ID columns must be specified by level_col and id_col. misc_props_col can be used to specify the name of the column to be used for miscellaneous properties; if no column by that name is found, a new column will be created and populated with empty dicts. All other columns will be considered explicit property types. The order of the columns does not matter.

This method assumes that there are no rows with the same (Level, ID); if duplicates are found, all but the first occurrence will be dropped.

add(*args)[source]

Updates the underlying data table with new entity data from multiple sources

Parameters:

*args – variable length argument list of Entity and/or representations of entity data

Returns:

self

Return type:

Entity

Warning

Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use Hypergraph.add_edge or Hypergraph.add_node_to_edge instead.

See also

add_element

update from a single source

Hypergraph.add_edge, Hypergraph.add_node_to_edge

add_element(data)[source]

Updates the underlying data table with new entity data

Supports adding from either an existing Entity or a representation of entity (data table or labeled system of sets are both supported representations)

Parameters:

data (Entity, pandas.DataFrame, or dict of lists or sets) – new entity data

Returns:

self

Return type:

Entity

Warning

Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use Hypergraph.add_edge or Hypergraph.add_node_to_edge instead.

See also

add

takes multiple sources of new entity data as variable length argument list

Hypergraph.add_edge, Hypergraph.add_node_to_edge

add_elements_from(arg_set)[source]

Adds arguments from an iterable to the data table one at a time

..deprecated:: 2.0.0

Duplicates add

Parameters:

arg_set (iterable) – list of Entity and/or representations of entity data

Returns:

self

Return type:

Entity

assign_properties(props: DataFrame | dict[int, dict[T, dict[Any, Any]]], misc_col: str | None = None, level_col=0, id_col=1) None[source]

Assign new properties to items in the data table, update properties

Parameters:
  • props (pandas.DataFrame or doubly-nested dict) – See documentation of the properties parameter in Entity

  • level_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to _level_col, _id_col, _misc_props_col, respectively.

  • id_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to _level_col, _id_col, _misc_props_col, respectively.

  • misc_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to _level_col, _id_col, _misc_props_col, respectively.

See also

properties

property cell_weights

Cell weights corresponding to each row of the underlying data table

Returns:

dict of {tuple – Keyed by row of data table (as a tuple)

Return type:

int or float}

property children

Labels of all items in level 1 (second column) of the underlying data table

Return type:

frozenset

See also

uidset

Labels of all items in level 0 (first column)

uidset_by_level, uidset_by_column

property data

Sparse representation of the data table as an incidence tensor

This can also be thought of as an encoding of dataframe, where items in each column of the data table are translated to their int position in the self.labels[column] list :returns: 2D array of ints representing rows of the underlying data table as indices in an incidence tensor :rtype: numpy.ndarray

See also

labels, dataframe

property dataframe

The underlying data table stored by the Entity

Return type:

pandas.DataFrame

property dimensions

Dimensions of data i.e., the number of distinct items in each level (column) of the underlying data table

Returns:

Length and order corresponds to columns of self.dataframe (excluding cell weight column)

Return type:

tuple of ints

property dimsize

Number of levels (columns) in the underlying data table

Returns:

Equal to length of self.dimensions

Return type:

int

property elements

System of sets representation of the first two levels (columns) of the underlying data table

Each item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table

Returns:

System of sets representation as dict of {level 0 item : AttrList(level 1 items)}

Return type:

dict of AttrList

See also

incidence_dict

same data as dict of list

memberships

dual of this representation, i.e., each item in level 1 (second column) defines a set

elements_by_level, elements_by_column

elements_by_column(col1, col2)[source]

System of sets representation of two columns (levels) of the underlying data table

Each item in col1 defines a set containing all the col2 items with which it appears in the same row of the underlying data table

Properties can be accessed and assigned to items in col1

Parameters:
  • col1 (Hashable) – name of column whose items define sets

  • col2 (Hashable) – name of column whose items are elements in the system of sets

Returns:

System of sets representation as dict of {col1 item : AttrList(col2 items)}

Return type:

dict of AttrList

See also

elements, memberships

elements_by_level

same functionality, takes level indices instead of column names

elements_by_level(level1, level2)[source]

System of sets representation of two levels (columns) of the underlying data table

Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table

Properties can be accessed and assigned to items in level1

Parameters:
  • level1 (int) – index of level whose items define sets

  • level2 (int) – index of level whose items are elements in the system of sets

Returns:

System of sets representation as dict of {level1 item : AttrList(level2 items)}

Return type:

dict of AttrList

See also

elements, memberships

elements_by_column

same functionality, takes column names instead of level indices

property empty

Whether the underlying data table is empty or not

Return type:

bool

See also

is_empty

for checking whether a specified level (column) is empty

dimsize

0 if empty

encode(data)[source]

Encode dataframe to numpy array

Parameters:

data (dataframe) –

Return type:

numpy.array

get_properties(item: T, level: int | None = None) dict[Any, Any][source]

Get all properties of an item

Parameters:
  • item (hashable) – name of an item

  • level (int, optional) – level index of the item

Returns:

prop_vals{named property: property value, ..., misc. property column name: {property name: property value}}

Return type:

dict

Raises:

KeyError – if (level, item) is not in properties, or if level is not provided and item is not in properties

Warns:

UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)

get_property(item: T, prop_name: Any, level: int | None = None) Any[source]

Get a property of an item

Parameters:
  • item (hashable) – name of an item

  • prop_name (hashable) – name of the property to get

  • level (int, optional) – level index of the item

Returns:

prop_val – value of the property

Return type:

any

Raises:

KeyError – if (level, item) is not in properties, or if level is not provided and item is not in properties

Warns:

UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)

property incidence_dict: dict[T, list[T]]

System of sets representation of the first two levels (columns) of the underlying data table

Returns:

System of sets representation as dict of {level 0 item : AttrList(level 1 items)}

Return type:

dict of list

See also

elements

same data as dict of AttrList

incidence_matrix(level1=0, level2=1, weights=False, aggregateby=None, index=False) csr_matrix | None[source]

Incidence matrix representation for two levels (columns) of the underlying data table

If level1 and level2 contain N and M distinct items, respectively, the incidence matrix will be M x N. In other words, the items in level1 and level2 correspond to the columns and rows of the incidence matrix, respectively, in the order in which they appear in self.labels[column1] and self.labels[column2] (column1 and column2 are the column labels of level1 and level2)

Parameters:
  • level1 (int, default=0) – index of first level (column)

  • level2 (int, default=1) – index of second level

  • weights (bool or dict, default=False) – If False all nonzero entries are 1. If True all nonzero entries are filled by self.cell_weight dictionary values, use aggregateby to specify how duplicate entries should have weights aggregated. If dict of {(level1 item, level2 item): weight value} form; only nonzero cells in the incidence matrix will be updated by dictionary, i.e., level1 item and level2 item must appear in the same row at least once in the underlying data table

  • aggregateby ({'last', count', 'sum', 'mean','median', max', 'min', 'first', 'last', None}, default='count') –

    Method to aggregate weights of duplicate rows in data table.

    If None, then all cell weights will be set to 1.

  • index (bool, optional) – Not used

Returns:

sparse representation of incidence matrix (i.e. Compressed Sparse Row matrix)

Return type:

scipy.sparse.csr.csr_matrix

Note

In the context of Hypergraphs, think level1 = edges, level2 = nodes

index(column, value=None)[source]

Get level index corresponding to a column and (optionally) the index of a value in that column

The index of value is its position in the list given by self.labels[column], which is used in the integer encoding of the data table self.data

Parameters:
  • column (str) – name of a column in self.dataframe

  • value (str, optional) – label of an item in the specified column

Returns:

level index corresponding to column, index of value if provided

Return type:

int or (int, int)

See also

indices

for finding indices of multiple values in a column

level

same functionality, search for the value without specifying column

indices(column, values)[source]

Get indices of one or more value(s) in a column

Parameters:
  • column (str) –

  • values (str or iterable of str) –

Returns:

indices of values

Return type:

list of int

See also

index

for finding level index of a column and index of a single value

is_empty(level=0)[source]

Whether a specified level (column) of the underlying data table is empty or not

Return type:

bool

See also

empty

for checking whether the underlying data table is empty

size

number of items in a level (columns); 0 if level is empty

property isstatic

Whether to treat the underlying data as static or not

If True, the underlying data may not be altered, and the state_dict will never be cleared Otherwise, rows may be added to and removed from the data table, and updates will clear the state_dict

Return type:

bool

property labels

Labels of all items in each column of the underlying data table

Returns:

dict of {column name: [item labels]} The order of [item labels] corresponds to the int encoding of each item in self.data.

Return type:

dict of lists

See also

data, dataframe

level(item, min_level=0, max_level=None, return_index=True)[source]

First level containing the given item label

Order of levels corresponds to order of columns in self.dataframe

Parameters:
  • item (str) –

  • min_level (int, optional) – inclusive bounds on range of levels to search for item

  • max_level (int, optional) – inclusive bounds on range of levels to search for item

  • return_index (bool, default=True) – If True, return index of item within the level

Returns:

index of first level containing the item, index of item if return_index=True returns None if item is not found

Return type:

int, (int, int), or None

See also

index, indices

property memberships

System of sets representation of the first two levels (columns) of the underlying data table

Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table

Returns:

System of sets representation as dict of {level 1 item : AttrList(level 0 items)}

Return type:

dict of AttrList

See also

elements

dual of this representation i.e., each item in level 0 (first column) defines a set

elements_by_level, elements_by_column

property properties: DataFrame

Properties assigned to items in the underlying data table

Return type:

pandas.DataFrame

remove(*args)[source]

Removes all rows containing specified item(s) from the underlying data table

Parameters:

*args – variable length argument list of item labels

Returns:

self

Return type:

Entity

See also

remove_element

remove all rows containing a single specified item

remove_element(item)[source]

Removes all rows containing a specified item from the underlying data table

Parameters:

item – item label

Returns:

self

Return type:

Entity

See also

remove

same functionality, accepts variable length argument list of item labels

remove_elements_from(arg_set)[source]

Removes all rows containing specified item(s) from the underlying data table

..deprecated: 2.0.0

Duplicates remove

Parameters:

arg_set (iterable) – list of item labels

Returns:

self

Return type:

Entity

restrict_to_indices(indices, level=0, **kwargs)[source]

Create a new Entity by restricting the data table to rows containing specific items in a given level

Parameters:
  • indices (int or iterable of int) – indices of item label(s) in level to restrict to

  • level (int, default=0) – level index

  • **kwargs – Extra arguments to Entity constructor

Return type:

Entity

restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', **kwargs) Entity[source]

Create a new Entity by restricting to a subset of levels (columns) in the underlying data table

Parameters:
  • levels (array-like of int) – indices of a subset of levels (columns) of data

  • weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights Otherwise, all new cell weights will be 1

  • aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1

  • **kwargs – Extra arguments to Entity constructor

Return type:

Entity

Raises:

KeyError – If levels contains any invalid values

See also

EntitySet

set_property(item: T, prop_name: Any, prop_val: Any, level: int | None = None) None[source]

Set a property of an item

Parameters:
  • item (hashable) – name of an item

  • prop_name (hashable) – name of the property to set

  • prop_val (any) – value of the property to set

  • level (int, optional) – level index of the item; required if item is not already in properties

Raises:

ValueError – If level is not provided and item is not in properties

Warns:

UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)

size(level=0)[source]

The number of items in a level of the underlying data table

Equivalent to self.dimensions[level]

Parameters:

level (int, default=0) –

Return type:

int

See also

dimensions

translate(level, index)[source]

Given indices of a level and value(s), return the corresponding value label(s)

Parameters:
  • level (int) – level index

  • index (int or list of int) – value index or indices

Returns:

label(s) corresponding to value index or indices

Return type:

str or list of str

See also

translate_arr

translate a full row of value indices across all levels (columns)

translate_arr(coords)[source]

Translate a full encoded row of the data table e.g., a row of self.data

Parameters:

coords (tuple of ints) – encoded value indices, with one value index for each level of the data

Returns:

full row of translated value labels

Return type:

list of str

property uid

User-defined unique identifier for the Entity

Return type:

hashable

property uidset

Labels of all items in level 0 (first column) of the underlying data table

Return type:

frozenset

See also

children

Labels of all items in level 1 (second column)

uidset_by_level, uidset_by_column

uidset_by_column(column)[source]

Labels of all items in a particular column (level) of the underlying data table

Parameters:

column (Hashable) – Name of a column in self.dataframe

Return type:

frozenset

See also

uidset

Labels of all items in level 0 (first column)

children

Labels of all items in level 1 (second column)

uidset_by_level

Same functionality, takes the level index instead of column name

uidset_by_level(level)[source]

Labels of all items in a particular level (column) of the underlying data table

Parameters:

level (int) –

Return type:

frozenset

See also

uidset

Labels of all items in level 0 (first column)

children

Labels of all items in level 1 (second column)

uidset_by_column

Same functionality, takes the column name instead of level index

class classes.EntitySet(entity: pd.DataFrame | np.ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[T, Any]]] | None = None, data: np.ndarray | None = None, labels: OrderedDict[T, Sequence[T]] | None = None, level1: str | int = 0, level2: str | int = 1, weight_col: str | int = 'cell_weights', weights: Sequence[float] | float | int | str = 1, cell_properties: Sequence[T] | pd.DataFrame | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_cell_props_col: str = 'cell_properties', uid: Hashable | None = None, aggregateby: str | None = 'sum', properties: pd.DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', **kwargs)[source]

Bases: Entity

Class for handling 2-dimensional (i.e., system of sets, bipartite) data when building network-like models, i.e., Hypergraph

Parameters:
  • entity (Entity, pandas.DataFrame, dict of lists or sets, or list of lists or sets, optional) – If an Entity with N levels or a DataFrame with N columns, represents N-dimensional entity data (data table). If N > 2, only considers levels (columns) level1 and level2. Otherwise, represents 2-dimensional entity data (system of sets).

  • data (numpy.ndarray, optional) – 2D M x N ndarray of ints (data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. If N > 2, only considers levels (columns) level1 and level2. Ignored if entity is provided.

  • labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to ints in data. For M x N data, N > 2, labels must contain either 2 or N keys. If N keys, only considers labels for levels (columns) level1 and level2. Ignored if entity is provided or data is not provided.

  • level1 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If int, gives the index of a level; if str, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).

  • level2 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If int, gives the index of a level; if str, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).

  • weights (str or sequence of float, optional) –

    User-specified cell weights corresponding to entity data. If sequence of floats and entity or data defines a data table,

    length must equal the number of rows.

    If sequence of floats and entity defines a system of sets,

    length must equal the total sum of the sizes of all sets.

    If str and entity is a DataFrame,

    must be the name of a column in entity.

    Otherwise, weight for all cells is assumed to be 1. Ignored if entity is an Entity and `keep_weights`=True.

  • keep_weights (bool, default=True) – Whether to preserve any existing cell weights; ignored if entity is not an Entity.

  • cell_properties (str, list of str, pandas.DataFrame, or doubly-nested dict, optional) – User-specified properties to be assigned to cells of the incidence matrix, i.e., rows in a data table; pairs of (set, element of set) in a system of sets. See Notes for detailed explanation. Ignored if underlying data is 1-dimensional (set). If doubly-nested dict, {level1 item: {level2 item: {cell property name: cell property value}}}.

  • misc_cell_props_col (str, default='cell_properties') – Column name for miscellaneous cell properties; see Notes for explanation.

  • kwargs – Keyword arguments passed to the Entity constructor, e.g., static, uid, aggregateby, properties, etc. See Entity for documentation of these parameters.

Notes

A cell property is a named attribute assigned jointly to a set and one of its elements, i.e, a cell of the incidence matrix.

When an Entity or DataFrame is passed to the entity parameter of the constructor, it should represent a data table:

Column_1

Column_2

Column_3

[…]

Column_N

level 1 item

level 2 item

level 3 item

level N item

Assuming the default values for parameters level1, level2, the data table will be restricted to the set system defined by Column 1 and Column 2. Since each row of the data table represents an incidence or cell, values from other columns may contain data that should be converted to cell properties.

By passing a column name or list of column names as cell_properties, each given column will be preserved in the cell_properties as an explicit cell property type. An additional column in cell_properties will be created to store a dict of miscellaneous cell properties, which will store cell properties of types that have not been explicitly defined and do not have a dedicated column (which may be assigned after construction). The name of the miscellaneous column is determined by misc_cell_props_col.

You can also pass a pre-constructed table to cell_properties as a DataFrame:

Column_1

Column_2

[explicit cell prop. type]

[…]

misc. cell properties

level 1 item

level 2 item

cell property value

{cell property name: cell property value}

Column 1 and Column 2 must have the same names as the corresponding columns in the entity data table, and misc_cell_props_col can be used to specify the name of the column to be used for miscellaneous cell properties. If no column by that name is found, a new column will be created and populated with empty dicts. All other columns will be considered explicit cell property types. The order of the columns does not matter.

Both of these methods assume that there are no row duplicates in the tables passed to entity and/or cell_properties; if duplicates are found, all but the first occurrence will be dropped.

assign_cell_properties(cell_props: DataFrame | dict[T, dict[T, dict[Any, Any]]], misc_col: str | None = None, replace: bool = False) None[source]

Assign new properties to cells of the incidence matrix and update properties

Parameters:
  • cell_props (pandas.DataFrame, dict of iterables, or doubly-nested dict, optional) – See documentation of the cell_properties parameter in EntitySet

  • misc_col (str, optional) – name of column to be used for miscellaneous cell property dicts

  • replace (bool, default=False) – If True, replace existing cell_properties with result; otherwise update with new values from result

Raises:

AttributeError – Not supported for :attr:`dimsize`=1

property cell_properties: DataFrame | None

Properties assigned to cells of the incidence matrix

Returns:

Returns None if dimsize < 2

Return type:

pandas.Series, optional

collapse_identical_elements(return_equivalence_classes: bool = False, **kwargs) EntitySet | tuple[hypernetx.classes.entityset.EntitySet, dict[str, list[str]]][source]

Create a new EntitySet by collapsing sets with the same set elements

Each item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table.

Parameters:
  • return_equivalence_classes (bool, default=False) – If True, return a dictionary of equivalence classes keyed by new edge names

  • **kwargs – Extra arguments to EntitySet constructor

Returns:

  • new_entity (EntitySet) – new EntitySet with identical sets collapsed; if all sets are unique, the system of sets will be the same as the original.

  • equivalence_classes (dict of lists, optional) – if return_equivalence_classes`=True, ``{collapsed set label: [level 0 item labels]}`.

get_cell_properties(item1: T, item2: T) dict[Any, Any][source]

Get all properties of a cell, i.e., incidence between items of different levels

Parameters:
  • item1 (hashable) – name of an item in level 0

  • item2 (hashable) – name of an item in level 1

Returns:

{named cell property: cell property value, ..., misc. cell property column name: {cell property name: cell property value}}

Return type:

dict

get_cell_property(item1: T, item2: T, prop_name: Any) Any[source]

Get a property of a cell i.e., incidence between items of different levels

Parameters:
  • item1 (hashable) – name of an item in level 0

  • item2 (hashable) – name of an item in level 1

  • prop_name (hashable) – name of the cell property to get

Returns:

prop_val – value of the cell property

Return type:

any

property memberships: dict[str, hypernetx.classes.helpers.AttrList[str]]

Extends Entity.memberships

Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table.

Returns:

System of sets representation as dict of {level 1 item: AttrList(level 0 items)}.

Return type:

dict of AttrList

See also

elements

dual of this representation, i.e., each item in level 0 (first column) defines a set

restrict_to_levels

for more information on how memberships work for 1-dimensional (set) data

restrict_to(indices: int | Iterable[int], **kwargs) EntitySet[source]

Alias of restrict_to_indices() with default parameter `level`=0

Parameters:
  • indices (array_like of int) – indices of item label(s) in level to restrict to

  • **kwargs – Extra arguments to EntitySet constructor

Return type:

EntitySet

See also

restrict_to_indices

restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', keep_memberships: bool = True, **kwargs) EntitySet[source]

Extends Entity.restrict_to_levels()

Parameters:
  • levels (array-like of int) – indices of a subset of levels (columns) of data

  • weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights. Otherwise, all new cell weights will be 1.

  • aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1

  • keep_memberships (bool, default=True) – Whether to preserve membership information for the discarded level when the new EntitySet is restricted to a single level

  • **kwargs – Extra arguments to EntitySet constructor

Return type:

EntitySet

Raises:

KeyError – If levels contains any invalid values

set_cell_property(item1: T, item2: T, prop_name: Any, prop_val: Any) None[source]

Set a property of a cell i.e., incidence between items of different levels

Parameters:
  • item1 (hashable) – name of an item in level 0

  • item2 (hashable) – name of an item in level 1

  • prop_name (hashable) – name of the cell property to set

  • prop_val (any) – value of the cell property to set

class classes.Hypergraph(setsystem: DataFrame | ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, edge_col: str | int = 0, node_col: str | int = 1, cell_weight_col: str | int | None = 'cell_weights', cell_weights: Sequence[float] | float = 1.0, cell_properties: Sequence[str | int] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, misc_cell_properties_col: str | int | None = None, aggregateby: str | dict[str, str] = 'first', edge_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, node_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, properties: DataFrame | dict[T, dict[Any, Any]] | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_properties_col: str | int | None = None, edge_weight_prop_col: str | int = 'weight', node_weight_prop_col: str | int = 'weight', weight_prop_col: str | int = 'weight', default_edge_weight: float | None = None, default_node_weight: float | None = None, default_weight: float = 1.0, name: str | None = None, **kwargs)[source]

Bases: object

Parameters:
  • setsystem ((optional) dict of iterables, dict of dicts,iterable of iterables,) – pandas.DataFrame, numpy.ndarray, default = None See SetSystem above for additional setsystem requirements.

  • edge_col ((optional) str | int, default = 0) – column index (or name) in pandas.dataframe or numpy.ndarray, used for (hyper)edge ids. Will be used to reference edgeids for all set systems.

  • node_col ((optional) str | int, default = 1) – column index (or name) in pandas.dataframe or numpy.ndarray, used for node ids. Will be used to reference nodeids for all set systems.

  • cell_weight_col ((optional) str | int, default = None) – column index (or name) in pandas.dataframe or numpy.ndarray used for referencing cell weights. For a dict of dicts references key in cell property dicts.

  • cell_weights ((optional) Sequence[float,int] | int | float , default = 1.0) – User specified cell_weights or default cell weight. Sequential values are only used if setsystem is a dataframe or ndarray in which case the sequence must have the same length and order as these objects. Sequential values are ignored for dataframes if cell_weight_col is already a column in the data frame. If cell_weights is assigned a single value then it will be used as default for missing values or when no cell_weight_col is given.

  • cell_properties ((optional) Sequence[int | str] | Mapping[T,Mapping[T,Mapping[str,Any]]],) – default = None Column names from pd.DataFrame to use as cell properties or a dict assigning cell_property to incidence pairs of edges and nodes. Will generate a misc_cell_properties, which may have variable lengths per cell.

  • misc_cell_properties ((optional) str | int, default = None) – Column name of dataframe corresponding to a column of variable length property dictionaries for the cell. Ignored for other setsystem types.

  • aggregateby ((optional) str, dict, default = 'first') – By default duplicate edge,node incidences will be dropped unless specified with aggregateby. See pandas.DataFrame.agg() methods for additional syntax and usage information.

  • edge_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with edge ids. First column of dataframe or keys of dict link to edge ids in setsystem.

  • node_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with node ids. First column of dataframe or keys of dict link to node ids in setsystem.

  • properties ((optional) pd.DataFrame | dict, default = None) – Concatenation/union of edge_properties and node_properties. By default, the object id is used and should be the first column of the dataframe, or key in the dict. If there are nodes and edges with the same ids and different properties then use the edge_properties and node_properties keywords.

  • misc_properties ((optional) int | str, default = None) – Column of property dataframes with dtype=dict. Intended for variable length property dictionaries for the objects.

  • edge_weight_prop ((optional) str, default = None,) – Name of property in edge_properties to use for weight.

  • node_weight_prop ((optional) str, default = None,) – Name of property in node_properties to use for weight.

  • weight_prop ((optional) str, default = None) – Name of property in properties to use for ‘weight’

  • default_edge_weight ((optional) int | float, default = 1) – Used when edge weight property is missing or undefined.

  • default_node_weight ((optional) int | float, default = 1) – Used when node weight property is missing or undefined

  • name ((optional) str, default = None) – Name assigned to hypergraph

Hypergraphs in HNX 2.0

An hnx.Hypergraph H = (V,E) references a pair of disjoint sets: V = nodes (vertices) and E = (hyper)edges.

HNX allows for multi-edges by distinguishing edges by their identifiers instead of their contents. For example, if V = {1,2,3} and E = {e1,e2,e3}, where e1 = {1,2}, e2 = {1,2}, and e3 = {1,2,3}, the edges e1 and e2 contain the same set of nodes and yet are distinct and are distinguishable within H = (V,E).

New as of version 2.0, HNX provides methods to easily store and access additional metadata such as cell, edge, and node weights. Metadata associated with (edge,node) incidences are referenced as cell_properties. Metadata associated with a single edge or node is referenced as its properties.

The fundamental object needed to create a hypergraph is a setsystem. The setsystem defines the many-to-many relationships between edges and nodes in the hypergraph. Cell properties for the incidence pairs can be defined within the setsystem or in a separate pandas.Dataframe or dict. Edge and node properties are defined with a pandas.DataFrame or dict.

SetSystems

There are five types of setsystems currently accepted by the library.

  1. iterable of iterables : Barebones hypergraph uses Pandas default indexing to generate hyperedge ids. Elements must be hashable.:

    >>> H = Hypergraph([{1,2},{1,2},{1,2,3}])
    
  2. dictionary of iterables : the most basic way to express many-to-many relationships providing edge ids. The elements of the iterables must be hashable):

    >>> H = Hypergraph({'e1':[1,2],'e2':[1,2],'e3':[1,2,3]})
    
  3. dictionary of dictionaries : allows cell properties to be assigned to a specific (edge, node) incidence. This is particularly useful when there are variable length dictionaries assigned to each pair:

    >>> d = {'e1':{ 1: {'w':0.5, 'name': 'related_to'},
    >>>             2: {'w':0.1, 'name': 'related_to',
    >>>                 'startdate': '05.13.2020'}},
    >>>      'e2':{ 1: {'w':0.52, 'name': 'owned_by'},
    >>>             2: {'w':0.2}},
    >>>      'e3':{ 1: {'w':0.5, 'name': 'related_to'},
    >>>             2: {'w':0.2, 'name': 'owner_of'},
    >>>             3: {'w':1, 'type': 'relationship'}}
    
    >>> H = Hypergraph(d, cell_weight_col='w')
    
  4. pandas.DataFrame For large datasets and for datasets with cell properties it is most efficient to construct a hypergraph directly from a pandas.DataFrame. Incidence pairs are in the first two columns. Cell properties shared by all incidence pairs can be placed in their own column of the dataframe. Variable length dictionaries of cell properties particular to only some of the incidence pairs may be placed in a single column of the dataframe. Representing the data above as a dataframe df:

    col1

    col2

    w

    col3

    e1

    1

    0.5

    {‘name’:’related_to’}

    e1

    2

    0.1

    {“name”:”related_to”,

    “startdate”:”05.13.2020”}

    e2

    1

    0.52

    {“name”:”owned_by”}

    e2

    2

    0.2

    {…}

    The first row of the dataframe is used to reference each column.

    >>> H = Hypergraph(df,edge_col="col1",node_col="col2",
    >>>                 cell_weight_col="w",misc_cell_properties="col3")
    
  5. numpy.ndarray For homogeneous datasets given in an ndarray a pandas dataframe is generated and column names are added from the edge_col and node_col arguments. Cell properties containing multiple data types are added with a separate dataframe or dict and passed through the cell_properties keyword.

    >>> arr = np.array([['e1','1'],['e1','2'],
    >>>                 ['e2','1'],['e2','2'],
    >>>                 ['e3','1'],['e3','2'],['e3','3']])
    >>> H = hnx.Hypergraph(arr, column_names=['col1','col2'])
    

Edge and Node Properties

Properties specific to a single edge or node are passed through the keywords: edge_properties, node_properties, properties. Properties may be passed as dataframes or dicts. The first column or index of the dataframe or keys of the dict keys correspond to the edge and/or node identifiers. If identifiers are shared among edges and nodes, or are distinct for edges and nodes, properties may be combined into a single object and passed to the properties keyword. For example:

id

weight

properties

e1

5.0

{‘type’:’event’}

e2

0.52

{“name”:”owned_by”}

{…}

1

1.2

{‘color’:’red’}

2

.003

{‘name’:’Fido’,’color’:’brown’}

3

1.0

{}

A properties dictionary should have the format:

dp = {id1 : {prop1:val1, prop2,val2,...}, id2 : ... }

A properties dataframe may be used for nodes and edges sharing ids but differing in cell properties by adding a level index using 0 for edges and 1 for nodes:

level

id

weight

properties

0

e1

5.0

{‘type’:’event’}

0

e2

0.52

{“name”:”owned_by”}

{…}

1

1.2

{‘color’:’red’}

2

.003

{‘name’:’Fido’,’color’:’brown’}

{…}

Weights

The default key for cell and object weights is “weight”. The default value is 1. Weights may be assigned and/or a new default prescribed in the constructor using cell_weight_col and cell_weights for incidence pairs, and using edge_weight_prop, node_weight_prop, weight_prop, default_edge_weight, and default_node_weight for node and edge weights.

adjacency_matrix(s=1, index=False, remove_empty_rows=False)[source]

The s-adjacency matrix for the hypergraph.

Parameters:
  • s (int, optional, default = 1) –

  • index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns

  • remove_empty_rows (boolean, optional, default = False) –

Returns:

  • adjacency_matrix (scipy.sparse.csr.csr_matrix)

  • node_index (list) – index of ids for rows and columns

auxiliary_matrix(s=1, node=True, index=False)[source]

The unweighted s-edge or node auxiliary matrix for hypergraph

Parameters:
  • s (int, optional, default = 1) –

  • node (bool, optional, default = True) – whether to return based on node or edge adjacencies

Returns:

  • auxiliary_matrix (scipy.sparse.csr.csr_matrix) – Node/Edge adjacency matrix with empty rows and columns removed

  • index (np.array) – row and column index of userids

bipartite()[source]

Constructs the networkX bipartite graph associated to hypergraph.

Returns:

bipartite

Return type:

nx.Graph()

Notes

Creates a bipartite networkx graph from hypergraph. The nodes and (hyper)edges of hypergraph become the nodes of bipartite graph. For every (hyper)edge e in the hypergraph and node n in e there is an edge (n,e) in the graph.

collapse_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]

Constructs a new hypergraph gotten by identifying edges containing the same nodes

Parameters:
  • name (hashable, optional, default = None) –

  • return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes

Returns:

  • new hypergraph (Hypergraph) – Equivalent edges are collapsed to a single edge named by a representative of the equivalent edges followed by a colon and the number of edges it represents.

  • equivalence_classes (dict) – A dictionary keyed by representative edge names with values equal to the edges in its equivalence class

Notes

Two edges are identified if their respective elements are the same. Using this as an equivalence relation, the uids of the edges are partitioned into equivalence classes.

A single edge from the collapsed edges followed by a colon and the number of elements in its equivalence class as uid for the new edge

collapse_nodes(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None) Hypergraph[source]

Constructs a new hypergraph gotten by identifying nodes contained by the same edges

Parameters:
  • name (str, optional, default = None) –

  • return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of node equivalence classes keyed by frozen sets of edges

  • use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed nodes as uid for the new node, otherwise uses a frozen set of the uids of nodes in the equivalence class. If use_reps is True the new nodes have uids given by a tuple of the rep and the count

  • return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]

Returns:

new hypergraph

Return type:

Hypergraph

Notes

Two nodes are identified if their respective memberships are the same. Using this as an equivalence relation, the uids of the nodes are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.

Example

>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')}))
>>> h = Hypergraph(data)
>>> h.collapse_nodes().incidence_dict
{'E1': ['a: 2'], 'E2': ['a: 2']}
collapse_nodes_and_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]

Returns a new hypergraph by collapsing nodes and edges.

Parameters:
  • name (str, optional, default = None) –

  • return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes

  • use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed elements as a representative. If use_reps is True, the new elements are keyed by a tuple of the rep and the count.

  • return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]

Returns:

new hypergraph

Return type:

Hypergraph

Notes

Collapses the Nodes and Edges of EntitySets. Two nodes(edges) are duplicates if their respective memberships(elements) are the same. Using this as an equivalence relation, the uids of the nodes(edges) are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.

Example

>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')}
>>> h = Hypergraph(data)
>>> h.incidence_dict
{'E1': ['a', 'b'], 'E2': ['a', 'b']}
>>> h.collapse_nodes_and_edges().incidence_dict
{'E1: 2': ['a: 2']}
component_subgraphs(return_singletons=False, name=None)[source]

Same as s_components_subgraphs() with s=1. Returns iterator.

components(edges=False)[source]

Same as s_connected_components() with s=1, but nodes are returned by default. Return iterator.

connected_component_subgraphs(return_singletons=True, name=None)[source]

Same as s_component_subgraphs() with s=1. Returns iterator

connected_components(edges=False)[source]

Same as s_connected_components() with s=1, but nodes are returned by default. Return iterator.

property dataframe

Returns dataframe of incidence pairs and their properties.

Return type:

pd.DataFrame

degree(node, s=1, max_size=None)[source]

The number of edges of size s that contain node.

Parameters:
  • node (hashable) – identifier for the node.

  • s (positive integer, optional, default 1) – smallest size of edge to consider in degree

  • max_size (positive integer or None, optional, default = None) – largest size of edge to consider in degree

Return type:

int

diameter(s=1)[source]

Returns the length of the longest shortest s-walk between nodes in hypergraph

Parameters:

s (int, optional, default 1) –

Returns:

diameter

Return type:

int

Raises:

HyperNetXError – If hypergraph is not s-edge-connected

Notes

Two nodes are s-adjacent if they share s edges. Two nodes v_start and v_end are s-walk connected if there is a sequence of nodes v_start, v_1, v_2, … v_n-1, v_end such that consecutive nodes are s-adjacent. If the graph is not connected, an error will be raised.

dim(edge)[source]

Same as size(edge)-1.

distance(source, target, s=1)[source]

Returns the shortest s-walk distance between two nodes in the hypergraph.

Parameters:
  • source (node.uid or node) – a node in the hypergraph

  • target (node.uid or node) – a node in the hypergraph

  • s (positive integer) – the number of edges

Returns:

s-walk distance

Return type:

int

See also

edge_distance

Notes

The s-distance is the shortest s-walk length between the nodes. An s-walk between nodes is a sequence of nodes that pairwise share at least s edges. The length of the shortest s-walk is 1 less than the number of nodes in the path sequence.

Uses the networkx shortest_path_length method on the graph generated by the s-adjacency matrix.

dual(name=None, switch_names=True)[source]

Constructs a new hypergraph with roles of edges and nodes of hypergraph reversed.

Parameters:
  • name (hashable, optional) –

  • switch_names (bool, optional, default = True) – reverses edge_col and node_col names unless edge_col = ‘edges’ and node_col = ‘nodes’

Return type:

hypergraph

edge_adjacency_matrix(s=1, index=False)[source]

The s-adjacency matrix for the dual hypergraph.

Parameters:
  • s (int, optional, default 1) –

  • index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns

Returns:

  • edge_adjacency_matrix (scipy.sparse.csr.csr_matrix)

  • edge_index (list) – index of ids for rows and columns

Notes

This is also the adjacency matrix for the line graph. Two edges are s-adjacent if they share at least s nodes. If remove_zeros is True will return the auxillary matrix

edge_diameter(s=1)[source]

Returns the length of the longest shortest s-walk between edges in hypergraph

Parameters:

s (int, optional, default 1) –

Returns:

edge_diameter

Return type:

int

Raises:

HyperNetXError – If hypergraph is not s-edge-connected

Notes

Two edges are s-adjacent if they share s nodes. Two nodes e_start and e_end are s-walk connected if there is a sequence of edges e_start, e_1, e_2, … e_n-1, e_end such that consecutive edges are s-adjacent. If the graph is not connected, an error will be raised.

edge_diameters(s=1)[source]

Returns the edge diameters of the s_edge_connected component subgraphs in hypergraph.

Parameters:

s (int, optional, default 1) –

Returns:

  • maximum diameter (int)

  • list of diameters (list) – List of edge_diameters for s-edge component subgraphs in hypergraph

  • list of component (list) – List of the edge uids in the s-edge component subgraphs.

edge_distance(source, target, s=1)[source]

XX TODO: still need to return path and translate into user defined nodes and edges Returns the shortest s-walk distance between two edges in the hypergraph.

Parameters:
  • source (edge.uid or edge) – an edge in the hypergraph

  • target (edge.uid or edge) – an edge in the hypergraph

  • s (positive integer) – the number of intersections between pairwise consecutive edges

  • TODO (add edge weights) –

  • weight (None or string, optional, default = None) – if None then all edges have weight 1. If string then edge attribute string is used if available.

Returns:

s- walk distance – A shortest s-walk is computed as a sequence of edges, the s-walk distance is the number of edges in the sequence minus 1. If no such path exists returns np.inf.

Return type:

the shortest s-walk edge distance

See also

distance

Notes

The s-distance is the shortest s-walk length between the edges. An s-walk between edges is a sequence of edges such that consecutive pairwise edges intersect in at least s nodes. The length of the shortest s-walk is 1 less than the number of edges in the path sequence.

Uses the networkx shortest_path_length method on the graph generated by the s-edge_adjacency matrix.

edge_neighbors(edge, s=1)[source]

The edges in hypergraph which share s nodes(s) with edge.

Parameters:
  • edge (hashable or Entity) – uid for a edge in hypergraph or the edge Entity

  • s (int, list, optional, default = 1) – Minimum number of nodes shared by neighbors edge node.

Returns:

List of edge neighbors

Return type:

list

property edge_props

Dataframe of edge properties indexed on edge ids

Return type:

pd.DataFrame

edge_size_dist()[source]

Returns the size for each edge

Return type:

np.array

property edges

Object associated with self._edges.

Return type:

EntitySet

classmethod from_bipartite(B, set_names=('edges', 'nodes'), name=None, **kwargs)[source]

Static method creates a Hypergraph from a bipartite graph.

Parameters:
  • B (nx.Graph()) – A networkx bipartite graph. Each node in the graph has a property ‘bipartite’ taking the value of 0 or 1 indicating a 2-coloring of the graph.

  • set_names (iterable of length 2, optional, default = ['edges','nodes']) – Category names assigned to the graph nodes associated to each bipartite set

  • name (hashable, optional) –

Return type:

Hypergraph

Notes

A partition for the nodes in a bipartite graph generates a hypergraph.

>>> import networkx as nx
>>> B = nx.Graph()
>>> B.add_nodes_from([1, 2, 3, 4], bipartite=0)
>>> B.add_nodes_from(['a', 'b', 'c'], bipartite=1)
>>> B.add_edges_from([(1, 'a'), (1, 'b'), (2, 'b'), (2, 'c'), /
    (3, 'c'), (4, 'a')])
>>> H = Hypergraph.from_bipartite(B)
>>> H.nodes, H.edges
# output: (EntitySet(_:Nodes,[1, 2, 3, 4],{}), /
# EntitySet(_:Edges,['b', 'c', 'a'],{}))
classmethod from_incidence_dataframe(df, columns=None, rows=None, edge_col: str = 'edges', node_col: str = 'nodes', name=None, fillna=0, transpose=False, transforms=[], key=None, return_only_dataframe=False, **kwargs)[source]

Create a hypergraph from a Pandas Dataframe object, which has values equal to the incidence matrix of a hypergraph. Its index will identify the nodes and its columns will identify its edges.

Parameters:
  • df (Pandas.Dataframe) – a real valued dataframe with a single index

  • columns ((optional) list, default = None) – restricts df to the columns with headers in this list.

  • rows ((optional) list, default = None) – restricts df to the rows indexed by the elements in this list.

  • name ((optional) string, default = None) –

  • fillna (float, default = 0) – a real value to place in empty cell, all-zero columns will not generate an edge.

  • transpose ((optional) bool, default = False) – option to transpose the dataframe, in this case df.Index will identify the edges and df.columns will identify the nodes, transpose is applied before transforms and key

  • transforms ((optional) list, default = []) – optional list of transformations to apply to each column, of the dataframe using pd.DataFrame.apply(). Transformations are applied in the order they are given (ex. abs). To apply transforms to rows or for additional functionality, consider transforming df using pandas.DataFrame methods prior to generating the hypergraph.

  • key ((optional) function, default = None) – boolean function to be applied to dataframe. will be applied to entire dataframe.

  • return_only_dataframe ((optional) bool, default = False) – to use the incidence_dataframe with cell_properties or properties, set this to true and use it as the setsystem in the Hypergraph constructor.

See also

from_numpy_array

Return type:

Hypergraph

classmethod from_incidence_matrix(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]

Same as from_numpy_array.

classmethod from_numpy_array(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]

Create a hypergraph from a real valued matrix represented as a 2 dimensionsl numpy array. The matrix is converted to a matrix of 0’s and 1’s so that any truthy cells are converted to 1’s and all others to 0’s.

Parameters:
  • M (real valued array-like object, 2 dimensions) – representing a real valued matrix with rows corresponding to nodes and columns to edges

  • node_names (object, array-like, default=None) – List of node names must be the same length as M.shape[0]. If None then the node names correspond to row indices with ‘v’ prepended.

  • edge_names (object, array-like, default=None) – List of edge names must have the same length as M.shape[1]. If None then the edge names correspond to column indices with ‘e’ prepended.

  • name (hashable) –

  • key ((optional) function) – boolean function to be evaluated on each cell of the array, must be applicable to numpy.array

Return type:

Hypergraph

Note

The constructor does not generate empty edges. All zero columns in M are removed and the names corresponding to these edges are discarded.

get_cell_properties(edge: str, node: str, prop_name: str | None = None) Any | dict[str, Any][source]

Get cell properties on a specified edge and node

Parameters:
  • edge (str) – edgeid

  • node (str) – nodeid

  • prop_name (str, optional) – name of a cell property; if None, all cell properties will be returned

Returns:

cell property value if prop_name is provided, otherwise dict of all cell properties and values

Return type:

int or str or dict of {str: any}

get_linegraph(s=1, edges=True)[source]

Creates an ::term::s-linegraph for the Hypergraph. If edges=True (default)then the edges will be the vertices of the line graph. Two vertices are connected by an s-line-graph edge if the corresponding hypergraph edges intersect in at least s hypergraph nodes. If edges=False, the hypergraph nodes will be the vertices of the line graph. Two vertices are connected if the nodes they correspond to share at least s incident hyper edges.

Parameters:
  • s (int) – The width of the connections.

  • edges (bool, optional, default = True) – Determine if edges or nodes will be the vertices in the linegraph.

Returns:

A NetworkX graph.

Return type:

nx.Graph

get_properties(id, level=None, prop_name=None)[source]

Returns an object’s specific property or all properties

Parameters:
  • id (hashable) – edge or node id

  • level (int | None , optional, default = None) – if separate edge and node properties then enter 0 for edges and 1 for nodes.

  • prop_name (str | None, optional, default = None) – if None then all properties associated with the object will be returned.

Returns:

single property or dictionary of properties

Return type:

str or dict

incidence_dataframe(sort_rows=False, sort_columns=False, cell_weights=True)[source]

Returns a pandas dataframe for hypergraph indexed by the nodes and with column headers given by the edge names.

Parameters:
  • sort_rows (bool, optional, default =True) – sort rows based on hashable node names

  • sort_columns (bool, optional, default =True) – sort columns based on hashable edge names

  • cell_weights (bool, optional, default =True) –

property incidence_dict

Dictionary keyed by edge uids with values the uids of nodes in each edge

Return type:

dict

incidence_matrix(weights=False, index=False)[source]

An incidence matrix for the hypergraph indexed by nodes x edges.

Parameters:
  • weights (bool, default =False) – If False all nonzero entries are 1. If True and self.static all nonzero entries are filled by self.edges.cell_weights dictionary values.

  • index (boolean, optional, default = False) – If True return will include a dictionary of node uid : row number and edge uid : column number

Returns:

  • incidence_matrix (scipy.sparse.csr.csr_matrix or np.ndarray)

  • row_index (list) – index of node ids for rows

  • col_index (list) – index of edge ids for columns

is_connected(s=1, edges=False)[source]

Determines if hypergraph is s-connected.

Parameters:
  • s (int, optional, default 1) –

  • edges (boolean, optional, default = False) – If True, will determine if s-edge-connected. For s=1 s-edge-connected is the same as s-connected.

Returns:

is_connected

Return type:

boolean

Notes

A hypergraph is s node connected if for any two nodes v0,vn there exists a sequence of nodes v0,v1,v2,…,v(n-1),vn such that every consecutive pair of nodes v(i),v(i+1) share at least s edges.

A hypergraph is s edge connected if for any two edges e0,en there exists a sequence of edges e0,e1,e2,…,e(n-1),en such that every consecutive pair of edges e(i),e(i+1) share at least s nodes.

neighbors(node, s=1)[source]

The nodes in hypergraph which share s edge(s) with node.

Parameters:
  • node (hashable or Entity) – uid for a node in hypergraph or the node Entity

  • s (int, list, optional, default = 1) – Minimum number of edges shared by neighbors with node.

Returns:

neighbors – s-neighbors share at least s edges in the hypergraph

Return type:

list

node_diameters(s=1)[source]

Returns the node diameters of the connected components in hypergraph.

Parameters:
  • and (list of the diameters of the s-components) –

  • nodes (list of the s-component) –

property node_props

Dataframe of node properties indexed on node ids

Return type:

pd.DataFrame

property nodes

Object associated with self._nodes.

Return type:

EntitySet

number_of_edges(edgeset=None)[source]

The number of edges in edgeset belonging to hypergraph.

Parameters:

edgeset (an iterable of Entities, optional, default = None) – If None, then return the number of edges in hypergraph.

Returns:

number_of_edges

Return type:

int

number_of_nodes(nodeset=None)[source]

The number of nodes in nodeset belonging to hypergraph.

Parameters:

nodeset (an interable of Entities, optional, default = None) – If None, then return the number of nodes in hypergraph.

Returns:

number_of_nodes

Return type:

int

order()[source]

The number of nodes in hypergraph.

Returns:

order

Return type:

int

property properties

Returns dataframe of edge and node properties.

Return type:

pd.DataFrame

remove(keys, level=None, name=None)[source]

Creates a new hypergraph with nodes and/or edges indexed by keys removed. More efficient for creating a restricted hypergraph if the restricted set is greater than what is being removed.

Parameters:
  • keys (list | tuple | set | Hashable) – node and/or edge id(s) to restrict to

  • level (None, optional) – Enter 0 to remove edges with ids in keys. Enter 1 to remove nodes with ids in keys. If None then all objects in nodes and edges with the id will be removed.

  • name (str, optional) – Name of new hypergraph

Return type:

hnx.Hypergraph

remove_edges(keys, name=None)[source]
remove_nodes(keys, name=None)[source]
remove_singletons(name=None)[source]

Constructs clone of hypergraph with singleton edges removed.

Returns:

new hypergraph

Return type:

Hypergraph

restrict_to_edges(edges, name=None)[source]

New hypergraph gotten by restricting to edges

Parameters:

edges (Iterable) – edgeids to restrict to

Return type:

hnx.Hypergraph

restrict_to_nodes(nodes, name=None)[source]

New hypergraph gotten by restricting to nodes

Parameters:

nodes (Iterable) – nodeids to restrict to

Return type:

hnx. Hypergraph

s_component_subgraphs(s=1, edges=True, return_singletons=False, name=None)[source]

Returns a generator for the induced subgraphs of s_connected components. Removes singletons unless return_singletons is set to True. Computed using s-linegraph generated either by the hypergraph (edges=True) or its dual (edges = False)

Parameters:
  • s (int, optional, default 1) –

  • edges (boolean, optional, edges=False) – Determines if edge or node components are desired. Returns subgraphs equal to the hypergraph restricted to each set of nodes(edges) in the s-connected components or s-edge-connected components

  • return_singletons (bool, optional) –

Yields:

s_component_subgraphs (iterator) – Iterator returns subgraphs generated by the edges (or nodes) in the s-edge(node) components of hypergraph.

s_components(s=1, edges=True, return_singletons=True)[source]

Same as s_connected_components

s_connected_components(s=1, edges=True, return_singletons=False)[source]

Returns a generator for the s-edge-connected components or the s-node-connected components of the hypergraph.

Parameters:
  • s (int, optional, default 1) –

  • edges (boolean, optional, default = True) – If True will return edge components, if False will return node components

  • return_singletons (bool, optional, default = False) –

Notes

If edges=True, this method returns the s-edge-connected components as lists of lists of edge uids. An s-edge-component has the property that for any two edges e1 and e2 there is a sequence of edges starting with e1 and ending with e2 such that pairwise adjacent edges in the sequence intersect in at least s nodes. If s=1 these are the path components of the hypergraph.

If edges=False this method returns s-node-connected components. A list of sets of uids of the nodes which are s-walk connected. Two nodes v1 and v2 are s-walk-connected if there is a sequence of nodes starting with v1 and ending with v2 such that pairwise adjacent nodes in the sequence share s edges. If s=1 these are the path components of the hypergraph.

Example

>>> S = {'A':{1,2,3},'B':{2,3,4},'C':{5,6},'D':{6}}
>>> H = Hypergraph(S)
>>> list(H.s_components(edges=True))
[{'C', 'D'}, {'A', 'B'}]
>>> list(H.s_components(edges=False))
[{1, 2, 3, 4}, {5, 6}]
Yields:

s_connected_components (iterator) – Iterator returns sets of uids of the edges (or nodes) in the s-edge(node) components of hypergraph.

set_state(**kwargs)[source]

Allow state_dict updates from outside of class. Use with caution.

Parameters:

**kwargs – key=value pairs to save in state dictionary

property shape

(number of nodes, number of edges)

Return type:

tuple

singletons()[source]

Returns a list of singleton edges. A singleton edge is an edge of size 1 with a node of degree 1.

Returns:

singles – A list of edge uids.

Return type:

list

size(edge, nodeset=None)[source]

The number of nodes in nodeset that belong to edge. If nodeset is None then returns the size of edge

Parameters:

edge (hashable) – The uid of an edge in the hypergraph

Returns:

size

Return type:

int

toplexes(name=None)[source]

Returns a simple hypergraph corresponding to self.

Warning

Collapsing is no longer supported inside the toplexes method. Instead generate a new collapsed hypergraph and compute the toplexes of the new hypergraph.

Parameters:

name (str, optional, default = None) –