classes package
Submodules
classes.entity module
- class classes.entity.Entity(entity: DataFrame | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | None = None, data_cols: Sequence[T] = [0, 1], data: ndarray | None = None, static: bool = False, labels: OrderedDict[T, Sequence[T]] | None = None, uid: Hashable | None = None, weight_col: str | int | None = 'cell_weights', weights: Sequence[float] | float | int | str | None = 1, aggregateby: str | dict | None = 'sum', properties: DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', level_col: str = 'level', id_col: str = 'id')[source]
Bases:
object
Base class for handling N-dimensional data when building network-like models, i.e.,
Hypergraph
- Parameters:
entity (pandas.DataFrame, dict of lists or sets, list of lists or sets, optional) – If a
DataFrame
with N columns, represents N-dimensional entity data (data table). Otherwise, represents 2-dimensional entity data (system of sets). TODO: Test for compatibility with list of Entities and update docsdata (numpy.ndarray, optional) – 2D M x N
ndarray
ofints
(data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. Ignored if entity is provided.static (bool, default=True) – If
True
, entity data may not be altered, and thestate_dict
will never be cleared. Otherwise, rows may be added to and removed from the data table, and updates will clear thestate_dict
.labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to
ints
in data. Ignored if entity is provided or data is not provided.uid (hashable, optional) – A unique identifier for the object
weights (str or sequence of float, optional) –
User-specified cell weights corresponding to entity data. If sequence of
floats
and entity or data defines a data table,length must equal the number of rows.
- If sequence of
floats
and entity defines a system of sets, length must equal the total sum of the sizes of all sets.
- If
str
and entity is aDataFrame
, must be the name of a column in entity.
Otherwise, weight for all cells is assumed to be 1.
- If sequence of
aggregateby ({'sum', 'last', count', 'mean','median', max', 'min', 'first', None}) – Name of function to use for aggregating cell weights of duplicate rows when entity or data defines a data table, default is “sum”. If None, duplicate rows will be dropped without aggregating cell weights. Effectively ignored if entity defines a system of sets.
properties (pandas.DataFrame or doubly-nested dict, optional) – User-specified properties to be assigned to individual items in the data, i.e., cell entries in a data table; sets or set elements in a system of sets. See Notes for detailed explanation. If
DataFrame
, each row gives[optional item level, item label, optional named properties, {property name: property value}]
(order of columns does not matter; see note for an example). If doubly-nested dict,{item level: {item label: {property name: property value}}}
.misc_props_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in
properties
; see Notes for explanation.level_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in
properties
; see Notes for explanation.id_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in
properties
; see Notes for explanation.
Notes
A property is a named attribute assigned to a single item in the data.
You can pass a table of properties to properties as a
DataFrame
:Level (optional)
ID
[explicit property type]
[…]
misc. properties
0
level 0 item
property value
…
{property name: property value}
1
level 1 item
property value
…
{property name: property value}
…
…
…
…
…
N
level N item
property value
…
{property name: property value}
The Level column is optional. If not provided, properties will be assigned by ID (i.e., if an ID appears at multiple levels, the same properties will be assigned to all occurrences).
The names of the Level (if provided) and ID columns must be specified by level_col and id_col. misc_props_col can be used to specify the name of the column to be used for miscellaneous properties; if no column by that name is found, a new column will be created and populated with empty
dicts
. All other columns will be considered explicit property types. The order of the columns does not matter.This method assumes that there are no rows with the same (Level, ID); if duplicates are found, all but the first occurrence will be dropped.
- add(*args)[source]
Updates the underlying data table with new entity data from multiple sources
- Parameters:
*args – variable length argument list of Entity and/or representations of entity data
- Returns:
self
- Return type:
Warning
Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use
Hypergraph.add_edge
orHypergraph.add_node_to_edge
instead.
- add_element(data)[source]
Updates the underlying data table with new entity data
Supports adding from either an existing Entity or a representation of entity (data table or labeled system of sets are both supported representations)
- Parameters:
data (Entity, pandas.DataFrame, or dict of lists or sets) – new entity data
- Returns:
self
- Return type:
Warning
Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use Hypergraph.add_edge or Hypergraph.add_node_to_edge instead.
See also
add
takes multiple sources of new entity data as variable length argument list
Hypergraph.add_edge
,Hypergraph.add_node_to_edge
- add_elements_from(arg_set)[source]
Adds arguments from an iterable to the data table one at a time
- ..deprecated:: 2.0.0
Duplicates add
- Parameters:
arg_set (iterable) – list of Entity and/or representations of entity data
- Returns:
self
- Return type:
- assign_properties(props: DataFrame | dict[int, dict[T, dict[Any, Any]]], misc_col: str | None = None, level_col=0, id_col=1) None [source]
Assign new properties to items in the data table, update
properties
- Parameters:
props (pandas.DataFrame or doubly-nested dict) – See documentation of the properties parameter in
Entity
level_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to
_level_col
,_id_col
,_misc_props_col
, respectively.id_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to
_level_col
,_id_col
,_misc_props_col
, respectively.misc_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to
_level_col
,_id_col
,_misc_props_col
, respectively.
See also
- property cell_weights
Cell weights corresponding to each row of the underlying data table
- Returns:
dict of {tuple – Keyed by row of data table (as a tuple)
- Return type:
int or float}
- property children
Labels of all items in level 1 (second column) of the underlying data table
- Return type:
frozenset
- property data
Sparse representation of the data table as an incidence tensor
This can also be thought of as an encoding of dataframe, where items in each column of the data table are translated to their int position in the self.labels[column] list :returns: 2D array of ints representing rows of the underlying data table as indices in an incidence tensor :rtype: numpy.ndarray
- property dataframe
The underlying data table stored by the Entity
- Return type:
pandas.DataFrame
- property dimensions
Dimensions of data i.e., the number of distinct items in each level (column) of the underlying data table
- Returns:
Length and order corresponds to columns of self.dataframe (excluding cell weight column)
- Return type:
tuple of ints
- property dimsize
Number of levels (columns) in the underlying data table
- Returns:
Equal to length of self.dimensions
- Return type:
int
- property elements
System of sets representation of the first two levels (columns) of the underlying data table
Each item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table
- Returns:
System of sets representation as dict of {level 0 item : AttrList(level 1 items)}
- Return type:
dict of AttrList
See also
incidence_dict
same data as dict of list
memberships
dual of this representation, i.e., each item in level 1 (second column) defines a set
- elements_by_column(col1, col2)[source]
System of sets representation of two columns (levels) of the underlying data table
Each item in col1 defines a set containing all the col2 items with which it appears in the same row of the underlying data table
Properties can be accessed and assigned to items in col1
- Parameters:
col1 (Hashable) – name of column whose items define sets
col2 (Hashable) – name of column whose items are elements in the system of sets
- Returns:
System of sets representation as dict of {col1 item : AttrList(col2 items)}
- Return type:
dict of AttrList
See also
elements_by_level
same functionality, takes level indices instead of column names
- elements_by_level(level1, level2)[source]
System of sets representation of two levels (columns) of the underlying data table
Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table
Properties can be accessed and assigned to items in level1
- Parameters:
level1 (int) – index of level whose items define sets
level2 (int) – index of level whose items are elements in the system of sets
- Returns:
System of sets representation as dict of {level1 item : AttrList(level2 items)}
- Return type:
dict of AttrList
See also
elements_by_column
same functionality, takes column names instead of level indices
- property empty
Whether the underlying data table is empty or not
- Return type:
bool
- encode(data)[source]
Encode dataframe to numpy array
- Parameters:
data (dataframe) –
- Return type:
numpy.array
- get_properties(item: T, level: int | None = None) dict[Any, Any] [source]
Get all properties of an item
- Parameters:
item (hashable) – name of an item
level (int, optional) – level index of the item
- Returns:
prop_vals –
{named property: property value, ..., misc. property column name: {property name: property value}}
- Return type:
dict
- Raises:
KeyError – if (level, item) is not in
properties
, or if level is not provided and item is not inproperties
- Warns:
UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)
See also
- get_property(item: T, prop_name: Any, level: int | None = None) Any [source]
Get a property of an item
- Parameters:
item (hashable) – name of an item
prop_name (hashable) – name of the property to get
level (int, optional) – level index of the item
- Returns:
prop_val – value of the property
- Return type:
any
- Raises:
KeyError – if (level, item) is not in
properties
, or if level is not provided and item is not inproperties
- Warns:
UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)
See also
- property incidence_dict: dict[T, list[T]]
System of sets representation of the first two levels (columns) of the underlying data table
- Returns:
System of sets representation as dict of {level 0 item : AttrList(level 1 items)}
- Return type:
dict of list
See also
elements
same data as dict of AttrList
- incidence_matrix(level1=0, level2=1, weights=False, aggregateby=None, index=False) csr_matrix | None [source]
Incidence matrix representation for two levels (columns) of the underlying data table
If level1 and level2 contain N and M distinct items, respectively, the incidence matrix will be M x N. In other words, the items in level1 and level2 correspond to the columns and rows of the incidence matrix, respectively, in the order in which they appear in self.labels[column1] and self.labels[column2] (column1 and column2 are the column labels of level1 and level2)
- Parameters:
level1 (int, default=0) – index of first level (column)
level2 (int, default=1) – index of second level
weights (bool or dict, default=False) – If False all nonzero entries are 1. If True all nonzero entries are filled by self.cell_weight dictionary values, use
aggregateby
to specify how duplicate entries should have weights aggregated. If dict of {(level1 item, level2 item): weight value} form; only nonzero cells in the incidence matrix will be updated by dictionary, i.e., level1 item and level2 item must appear in the same row at least once in the underlying data tableaggregateby ({'last', count', 'sum', 'mean','median', max', 'min', 'first', 'last', None}, default='count') –
- Method to aggregate weights of duplicate rows in data table.
If None, then all cell weights will be set to 1.
index (bool, optional) – Not used
- Returns:
sparse representation of incidence matrix (i.e. Compressed Sparse Row matrix)
- Return type:
scipy.sparse.csr.csr_matrix
Note
In the context of Hypergraphs, think level1 = edges, level2 = nodes
- index(column, value=None)[source]
Get level index corresponding to a column and (optionally) the index of a value in that column
The index of
value
is its position in the list given byself.labels[column]
, which is used in the integer encoding of the data tableself.data
- Parameters:
column (str) – name of a column in self.dataframe
value (str, optional) – label of an item in the specified column
- Returns:
level index corresponding to column, index of value if provided
- Return type:
int or (int, int)
- indices(column, values)[source]
Get indices of one or more value(s) in a column
- Parameters:
column (str) –
values (str or iterable of str) –
- Returns:
indices of values
- Return type:
list of int
See also
index
for finding level index of a column and index of a single value
- is_empty(level=0)[source]
Whether a specified level (column) of the underlying data table is empty or not
- Return type:
bool
- property isstatic
Whether to treat the underlying data as static or not
If True, the underlying data may not be altered, and the state_dict will never be cleared Otherwise, rows may be added to and removed from the data table, and updates will clear the state_dict
- Return type:
bool
- property labels
Labels of all items in each column of the underlying data table
- Returns:
dict of {column name: [item labels]} The order of [item labels] corresponds to the int encoding of each item in self.data.
- Return type:
dict of lists
- level(item, min_level=0, max_level=None, return_index=True)[source]
First level containing the given item label
Order of levels corresponds to order of columns in self.dataframe
- Parameters:
item (str) –
min_level (int, optional) – inclusive bounds on range of levels to search for item
max_level (int, optional) – inclusive bounds on range of levels to search for item
return_index (bool, default=True) – If True, return index of item within the level
- Returns:
index of first level containing the item, index of item if return_index=True returns None if item is not found
- Return type:
int, (int, int), or None
- property memberships
System of sets representation of the first two levels (columns) of the underlying data table
Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table
- Returns:
System of sets representation as dict of {level 1 item : AttrList(level 0 items)}
- Return type:
dict of AttrList
See also
elements
dual of this representation i.e., each item in level 0 (first column) defines a set
- property properties: DataFrame
Properties assigned to items in the underlying data table
- Return type:
pandas.DataFrame
- remove(*args)[source]
Removes all rows containing specified item(s) from the underlying data table
- Parameters:
*args – variable length argument list of item labels
- Returns:
self
- Return type:
See also
remove_element
remove all rows containing a single specified item
- remove_element(item)[source]
Removes all rows containing a specified item from the underlying data table
- Parameters:
item – item label
- Returns:
self
- Return type:
See also
remove
same functionality, accepts variable length argument list of item labels
- remove_elements_from(arg_set)[source]
Removes all rows containing specified item(s) from the underlying data table
- ..deprecated: 2.0.0
Duplicates remove
- Parameters:
arg_set (iterable) – list of item labels
- Returns:
self
- Return type:
- restrict_to_indices(indices, level=0, **kwargs)[source]
Create a new Entity by restricting the data table to rows containing specific items in a given level
- Parameters:
indices (int or iterable of int) – indices of item label(s) in level to restrict to
level (int, default=0) – level index
**kwargs – Extra arguments to Entity constructor
- Return type:
- restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', **kwargs) Entity [source]
Create a new Entity by restricting to a subset of levels (columns) in the underlying data table
- Parameters:
levels (array-like of int) – indices of a subset of levels (columns) of data
weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights Otherwise, all new cell weights will be 1
aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1
**kwargs – Extra arguments to Entity constructor
- Return type:
- Raises:
KeyError – If levels contains any invalid values
See also
EntitySet
- set_property(item: T, prop_name: Any, prop_val: Any, level: int | None = None) None [source]
Set a property of an item
- Parameters:
item (hashable) – name of an item
prop_name (hashable) – name of the property to set
prop_val (any) – value of the property to set
level (int, optional) – level index of the item; required if item is not already in
properties
- Raises:
ValueError – If level is not provided and item is not in
properties
- Warns:
UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)
See also
- size(level=0)[source]
The number of items in a level of the underlying data table
Equivalent to
self.dimensions[level]
- Parameters:
level (int, default=0) –
- Return type:
int
See also
- translate(level, index)[source]
Given indices of a level and value(s), return the corresponding value label(s)
- Parameters:
level (int) – level index
index (int or list of int) – value index or indices
- Returns:
label(s) corresponding to value index or indices
- Return type:
str or list of str
See also
translate_arr
translate a full row of value indices across all levels (columns)
- translate_arr(coords)[source]
Translate a full encoded row of the data table e.g., a row of
self.data
- Parameters:
coords (tuple of ints) – encoded value indices, with one value index for each level of the data
- Returns:
full row of translated value labels
- Return type:
list of str
- property uid
User-defined unique identifier for the Entity
- Return type:
hashable
- property uidset
Labels of all items in level 0 (first column) of the underlying data table
- Return type:
frozenset
- uidset_by_column(column)[source]
Labels of all items in a particular column (level) of the underlying data table
- Parameters:
column (Hashable) – Name of a column in self.dataframe
- Return type:
frozenset
See also
uidset
Labels of all items in level 0 (first column)
children
Labels of all items in level 1 (second column)
uidset_by_level
Same functionality, takes the level index instead of column name
- uidset_by_level(level)[source]
Labels of all items in a particular level (column) of the underlying data table
- Parameters:
level (int) –
- Return type:
frozenset
See also
uidset
Labels of all items in level 0 (first column)
children
Labels of all items in level 1 (second column)
uidset_by_column
Same functionality, takes the column name instead of level index
classes.entityset module
- class classes.entityset.EntitySet(entity: pd.DataFrame | np.ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[T, Any]]] | None = None, data: np.ndarray | None = None, labels: OrderedDict[T, Sequence[T]] | None = None, level1: str | int = 0, level2: str | int = 1, weight_col: str | int = 'cell_weights', weights: Sequence[float] | float | int | str = 1, cell_properties: Sequence[T] | pd.DataFrame | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_cell_props_col: str = 'cell_properties', uid: Hashable | None = None, aggregateby: str | None = 'sum', properties: pd.DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', **kwargs)[source]
Bases:
Entity
Class for handling 2-dimensional (i.e., system of sets, bipartite) data when building network-like models, i.e.,
Hypergraph
- Parameters:
entity (Entity, pandas.DataFrame, dict of lists or sets, or list of lists or sets, optional) – If an
Entity
with N levels or aDataFrame
with N columns, represents N-dimensional entity data (data table). If N > 2, only considers levels (columns) level1 and level2. Otherwise, represents 2-dimensional entity data (system of sets).data (numpy.ndarray, optional) – 2D M x N
ndarray
ofints
(data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. If N > 2, only considers levels (columns) level1 and level2. Ignored if entity is provided.labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to
ints
in data. For M x N data, N > 2, labels must contain either 2 or N keys. If N keys, only considers labels for levels (columns) level1 and level2. Ignored if entity is provided or data is not provided.level1 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If
int
, gives the index of a level; ifstr
, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).level2 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If
int
, gives the index of a level; ifstr
, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).weights (str or sequence of float, optional) –
User-specified cell weights corresponding to entity data. If sequence of
floats
and entity or data defines a data table,length must equal the number of rows.
- If sequence of
floats
and entity defines a system of sets, length must equal the total sum of the sizes of all sets.
- If
str
and entity is aDataFrame
, must be the name of a column in entity.
Otherwise, weight for all cells is assumed to be 1. Ignored if entity is an
Entity
and `keep_weights`=True.- If sequence of
keep_weights (bool, default=True) – Whether to preserve any existing cell weights; ignored if entity is not an
Entity
.cell_properties (str, list of str, pandas.DataFrame, or doubly-nested dict, optional) – User-specified properties to be assigned to cells of the incidence matrix, i.e., rows in a data table; pairs of (set, element of set) in a system of sets. See Notes for detailed explanation. Ignored if underlying data is 1-dimensional (set). If doubly-nested dict,
{level1 item: {level2 item: {cell property name: cell property value}}}
.misc_cell_props_col (str, default='cell_properties') – Column name for miscellaneous cell properties; see Notes for explanation.
kwargs – Keyword arguments passed to the
Entity
constructor, e.g., static, uid, aggregateby, properties, etc. SeeEntity
for documentation of these parameters.
Notes
A cell property is a named attribute assigned jointly to a set and one of its elements, i.e, a cell of the incidence matrix.
When an
Entity
orDataFrame
is passed to the entity parameter of the constructor, it should represent a data table:Column_1
Column_2
Column_3
[…]
Column_N
level 1 item
level 2 item
level 3 item
…
level N item
…
…
…
…
…
Assuming the default values for parameters level1, level2, the data table will be restricted to the set system defined by Column 1 and Column 2. Since each row of the data table represents an incidence or cell, values from other columns may contain data that should be converted to cell properties.
By passing a column name or list of column names as cell_properties, each given column will be preserved in the
cell_properties
as an explicit cell property type. An additional column incell_properties
will be created to store adict
of miscellaneous cell properties, which will store cell properties of types that have not been explicitly defined and do not have a dedicated column (which may be assigned after construction). The name of the miscellaneous column is determined by misc_cell_props_col.You can also pass a pre-constructed table to cell_properties as a
DataFrame
:Column_1
Column_2
[explicit cell prop. type]
[…]
misc. cell properties
level 1 item
level 2 item
cell property value
…
{cell property name: cell property value}
…
…
…
…
…
Column 1 and Column 2 must have the same names as the corresponding columns in the entity data table, and misc_cell_props_col can be used to specify the name of the column to be used for miscellaneous cell properties. If no column by that name is found, a new column will be created and populated with empty
dicts
. All other columns will be considered explicit cell property types. The order of the columns does not matter.Both of these methods assume that there are no row duplicates in the tables passed to entity and/or cell_properties; if duplicates are found, all but the first occurrence will be dropped.
- assign_cell_properties(cell_props: DataFrame | dict[T, dict[T, dict[Any, Any]]], misc_col: str | None = None, replace: bool = False) None [source]
Assign new properties to cells of the incidence matrix and update
properties
- Parameters:
cell_props (pandas.DataFrame, dict of iterables, or doubly-nested dict, optional) – See documentation of the cell_properties parameter in
EntitySet
misc_col (str, optional) – name of column to be used for miscellaneous cell property dicts
replace (bool, default=False) – If True, replace existing
cell_properties
with result; otherwise update with new values from result
- Raises:
AttributeError – Not supported for :attr:`dimsize`=1
- property cell_properties: DataFrame | None
Properties assigned to cells of the incidence matrix
- Returns:
Returns None if
dimsize
< 2- Return type:
pandas.Series, optional
- collapse_identical_elements(return_equivalence_classes: bool = False, **kwargs) EntitySet | tuple[classes.entityset.EntitySet, dict[str, list[str]]] [source]
Create a new
EntitySet
by collapsing sets with the same set elementsEach item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table.
- Parameters:
return_equivalence_classes (bool, default=False) – If True, return a dictionary of equivalence classes keyed by new edge names
**kwargs – Extra arguments to
EntitySet
constructor
- Returns:
new_entity (EntitySet) – new
EntitySet
with identical sets collapsed; if all sets are unique, the system of sets will be the same as the original.equivalence_classes (dict of lists, optional) – if return_equivalence_classes`=True, ``{collapsed set label: [level 0 item labels]}`.
- get_cell_properties(item1: T, item2: T) dict[Any, Any] [source]
Get all properties of a cell, i.e., incidence between items of different levels
- Parameters:
item1 (hashable) – name of an item in level 0
item2 (hashable) – name of an item in level 1
- Returns:
{named cell property: cell property value, ..., misc. cell property column name: {cell property name: cell property value}}
- Return type:
dict
See also
- get_cell_property(item1: T, item2: T, prop_name: Any) Any [source]
Get a property of a cell i.e., incidence between items of different levels
- Parameters:
item1 (hashable) – name of an item in level 0
item2 (hashable) – name of an item in level 1
prop_name (hashable) – name of the cell property to get
- Returns:
prop_val – value of the cell property
- Return type:
any
See also
- property memberships: dict[str, hypernetx.classes.helpers.AttrList[str]]
Extends
Entity.memberships
Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table.
- Returns:
System of sets representation as dict of
{level 1 item: AttrList(level 0 items)}
.- Return type:
dict of AttrList
See also
elements
dual of this representation, i.e., each item in level 0 (first column) defines a set
restrict_to_levels
for more information on how memberships work for 1-dimensional (set) data
- restrict_to(indices: int | Iterable[int], **kwargs) EntitySet [source]
Alias of
restrict_to_indices()
with default parameter `level`=0- Parameters:
indices (array_like of int) – indices of item label(s) in level to restrict to
**kwargs – Extra arguments to
EntitySet
constructor
- Return type:
See also
restrict_to_indices
- restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', keep_memberships: bool = True, **kwargs) EntitySet [source]
Extends
Entity.restrict_to_levels()
- Parameters:
levels (array-like of int) – indices of a subset of levels (columns) of data
weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights. Otherwise, all new cell weights will be 1.
aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1
keep_memberships (bool, default=True) – Whether to preserve membership information for the discarded level when the new
EntitySet
is restricted to a single level**kwargs – Extra arguments to
EntitySet
constructor
- Return type:
- Raises:
KeyError – If levels contains any invalid values
- set_cell_property(item1: T, item2: T, prop_name: Any, prop_val: Any) None [source]
Set a property of a cell i.e., incidence between items of different levels
- Parameters:
item1 (hashable) – name of an item in level 0
item2 (hashable) – name of an item in level 1
prop_name (hashable) – name of the cell property to set
prop_val (any) – value of the cell property to set
See also
classes.helpers module
- class classes.helpers.AttrList(entity: Entity, key: tuple[int, str | int], initlist: list | None = None)[source]
Bases:
UserList
Custom list wrapper for integrated property storage in
Entity
- Parameters:
entity (hypernetx.Entity) –
key (tuple of (int, str or int)) –
(level, item)
initlist (list, optional) – list of elements, passed to
UserList
constructor
- classes.helpers.assign_weights(df, weights=1, weight_col='cell_weights')[source]
- Parameters:
df (pandas.DataFrame) – A DataFrame to assign a weight column to
weights (array-like or Hashable, optional) – If numpy.ndarray with the same length as df, create a new weight column with these values. If Hashable, must be the name of a column of df to assign as the weight column Otherwise, create a new weight column assigning a weight of 1 to every row
weight_col (Hashable) – Name for new column if one is created (not used if the name of an existing column is passed as weights)
- Returns:
df (pandas.DataFrame) – The original DataFrame with a new column added if needed
weight_col (str) – Name of the column assigned to hold weights
Note
TODO: move logic for default weights inside this method
- classes.helpers.create_properties(props: DataFrame | dict[str | int, collections.abc.Iterable[str | int]] | dict[str | int, dict[str | int, dict[Any, Any]]] | None, index_cols: list[str], misc_col: str) DataFrame [source]
Helper function for initializing properties and cell properties
- Parameters:
props (pandas.DataFrame, dict of iterables, doubly-nested dict, or None) – See documentation of the properties parameter in
Entity
, cell_properties parameter inEntitySet
index_cols (list of str) – names of columns to be used as levels of the MultiIndex
misc_col (str) – name of column to be used for miscellaneous property dicts
- Returns:
with
MultiIndex
on index_cols; each entry of the miscellaneous column holds dict of{property name: property value}
- Return type:
pandas.DataFrame
- classes.helpers.encode(data: DataFrame)[source]
Encode dataframe to numpy array
- Parameters:
data (dataframe) –
- Return type:
numpy.array
- classes.helpers.remove_row_duplicates(df, data_cols, weights=1, weight_col='cell_weights', aggregateby=None)[source]
Removes and aggregates duplicate rows of a DataFrame using groupby
- Parameters:
df (pandas.DataFrame) – A DataFrame to remove or aggregate duplicate rows from
data_cols (list) – A list of column names in df to perform the groupby on / remove duplicates from
weights (array-like or Hashable, optional) – Argument passed to assign_weights
aggregateby (str, optional, default='sum') – A valid aggregation method for pandas groupby If None, drop duplicates without aggregating weights
- Returns:
df (pandas.DataFrame) – The DataFrame with duplicate rows removed or aggregated
weight_col (Hashable) – The name of the column holding aggregated weights, or None if aggregateby=None
classes.hypergraph module
- class classes.hypergraph.Hypergraph(setsystem: DataFrame | ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, edge_col: str | int = 0, node_col: str | int = 1, cell_weight_col: str | int | None = 'cell_weights', cell_weights: Sequence[float] | float = 1.0, cell_properties: Sequence[str | int] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, misc_cell_properties_col: str | int | None = None, aggregateby: str | dict[str, str] = 'first', edge_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, node_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, properties: DataFrame | dict[T, dict[Any, Any]] | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_properties_col: str | int | None = None, edge_weight_prop_col: str | int = 'weight', node_weight_prop_col: str | int = 'weight', weight_prop_col: str | int = 'weight', default_edge_weight: float | None = None, default_node_weight: float | None = None, default_weight: float = 1.0, name: str | None = None, **kwargs)[source]
Bases:
object
- Parameters:
setsystem ((optional) dict of iterables, dict of dicts,iterable of iterables,) – pandas.DataFrame, numpy.ndarray, default = None See SetSystem above for additional setsystem requirements.
edge_col ((optional) str | int, default = 0) – column index (or name) in pandas.dataframe or numpy.ndarray, used for (hyper)edge ids. Will be used to reference edgeids for all set systems.
node_col ((optional) str | int, default = 1) – column index (or name) in pandas.dataframe or numpy.ndarray, used for node ids. Will be used to reference nodeids for all set systems.
cell_weight_col ((optional) str | int, default = None) – column index (or name) in pandas.dataframe or numpy.ndarray used for referencing cell weights. For a dict of dicts references key in cell property dicts.
cell_weights ((optional) Sequence[float,int] | int | float , default = 1.0) – User specified cell_weights or default cell weight. Sequential values are only used if setsystem is a dataframe or ndarray in which case the sequence must have the same length and order as these objects. Sequential values are ignored for dataframes if cell_weight_col is already a column in the data frame. If cell_weights is assigned a single value then it will be used as default for missing values or when no cell_weight_col is given.
cell_properties ((optional) Sequence[int | str] | Mapping[T,Mapping[T,Mapping[str,Any]]],) – default = None Column names from pd.DataFrame to use as cell properties or a dict assigning cell_property to incidence pairs of edges and nodes. Will generate a misc_cell_properties, which may have variable lengths per cell.
misc_cell_properties ((optional) str | int, default = None) – Column name of dataframe corresponding to a column of variable length property dictionaries for the cell. Ignored for other setsystem types.
aggregateby ((optional) str, dict, default = 'first') – By default duplicate edge,node incidences will be dropped unless specified with aggregateby. See pandas.DataFrame.agg() methods for additional syntax and usage information.
edge_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with edge ids. First column of dataframe or keys of dict link to edge ids in setsystem.
node_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with node ids. First column of dataframe or keys of dict link to node ids in setsystem.
properties ((optional) pd.DataFrame | dict, default = None) – Concatenation/union of edge_properties and node_properties. By default, the object id is used and should be the first column of the dataframe, or key in the dict. If there are nodes and edges with the same ids and different properties then use the edge_properties and node_properties keywords.
misc_properties ((optional) int | str, default = None) – Column of property dataframes with dtype=dict. Intended for variable length property dictionaries for the objects.
edge_weight_prop ((optional) str, default = None,) – Name of property in edge_properties to use for weight.
node_weight_prop ((optional) str, default = None,) – Name of property in node_properties to use for weight.
weight_prop ((optional) str, default = None) – Name of property in properties to use for ‘weight’
default_edge_weight ((optional) int | float, default = 1) – Used when edge weight property is missing or undefined.
default_node_weight ((optional) int | float, default = 1) – Used when node weight property is missing or undefined
name ((optional) str, default = None) – Name assigned to hypergraph
Hypergraphs in HNX 2.0
An hnx.Hypergraph H = (V,E) references a pair of disjoint sets: V = nodes (vertices) and E = (hyper)edges.
HNX allows for multi-edges by distinguishing edges by their identifiers instead of their contents. For example, if V = {1,2,3} and E = {e1,e2,e3}, where e1 = {1,2}, e2 = {1,2}, and e3 = {1,2,3}, the edges e1 and e2 contain the same set of nodes and yet are distinct and are distinguishable within H = (V,E).
New as of version 2.0, HNX provides methods to easily store and access additional metadata such as cell, edge, and node weights. Metadata associated with (edge,node) incidences are referenced as cell_properties. Metadata associated with a single edge or node is referenced as its properties.
The fundamental object needed to create a hypergraph is a setsystem. The setsystem defines the many-to-many relationships between edges and nodes in the hypergraph. Cell properties for the incidence pairs can be defined within the setsystem or in a separate pandas.Dataframe or dict. Edge and node properties are defined with a pandas.DataFrame or dict.
SetSystems
There are five types of setsystems currently accepted by the library.
iterable of iterables : Barebones hypergraph uses Pandas default indexing to generate hyperedge ids. Elements must be hashable.:
>>> H = Hypergraph([{1,2},{1,2},{1,2,3}])
dictionary of iterables : the most basic way to express many-to-many relationships providing edge ids. The elements of the iterables must be hashable):
>>> H = Hypergraph({'e1':[1,2],'e2':[1,2],'e3':[1,2,3]})
dictionary of dictionaries : allows cell properties to be assigned to a specific (edge, node) incidence. This is particularly useful when there are variable length dictionaries assigned to each pair:
>>> d = {'e1':{ 1: {'w':0.5, 'name': 'related_to'}, >>> 2: {'w':0.1, 'name': 'related_to', >>> 'startdate': '05.13.2020'}}, >>> 'e2':{ 1: {'w':0.52, 'name': 'owned_by'}, >>> 2: {'w':0.2}}, >>> 'e3':{ 1: {'w':0.5, 'name': 'related_to'}, >>> 2: {'w':0.2, 'name': 'owner_of'}, >>> 3: {'w':1, 'type': 'relationship'}}
>>> H = Hypergraph(d, cell_weight_col='w')
pandas.DataFrame For large datasets and for datasets with cell properties it is most efficient to construct a hypergraph directly from a pandas.DataFrame. Incidence pairs are in the first two columns. Cell properties shared by all incidence pairs can be placed in their own column of the dataframe. Variable length dictionaries of cell properties particular to only some of the incidence pairs may be placed in a single column of the dataframe. Representing the data above as a dataframe df:
col1
col2
w
col3
e1
1
0.5
{‘name’:’related_to’}
e1
2
0.1
- {“name”:”related_to”,
“startdate”:”05.13.2020”}
e2
1
0.52
{“name”:”owned_by”}
e2
2
0.2
…
…
…
{…}
The first row of the dataframe is used to reference each column.
>>> H = Hypergraph(df,edge_col="col1",node_col="col2", >>> cell_weight_col="w",misc_cell_properties="col3")
numpy.ndarray For homogeneous datasets given in an ndarray a pandas dataframe is generated and column names are added from the edge_col and node_col arguments. Cell properties containing multiple data types are added with a separate dataframe or dict and passed through the cell_properties keyword.
>>> arr = np.array([['e1','1'],['e1','2'], >>> ['e2','1'],['e2','2'], >>> ['e3','1'],['e3','2'],['e3','3']]) >>> H = hnx.Hypergraph(arr, column_names=['col1','col2'])
Edge and Node Properties
Properties specific to a single edge or node are passed through the keywords: edge_properties, node_properties, properties. Properties may be passed as dataframes or dicts. The first column or index of the dataframe or keys of the dict keys correspond to the edge and/or node identifiers. If identifiers are shared among edges and nodes, or are distinct for edges and nodes, properties may be combined into a single object and passed to the properties keyword. For example:
id
weight
properties
e1
5.0
{‘type’:’event’}
e2
0.52
{“name”:”owned_by”}
…
…
{…}
1
1.2
{‘color’:’red’}
2
.003
{‘name’:’Fido’,’color’:’brown’}
3
1.0
{}
A properties dictionary should have the format:
dp = {id1 : {prop1:val1, prop2,val2,...}, id2 : ... }
A properties dataframe may be used for nodes and edges sharing ids but differing in cell properties by adding a level index using 0 for edges and 1 for nodes:
level
id
weight
properties
0
e1
5.0
{‘type’:’event’}
0
e2
0.52
{“name”:”owned_by”}
…
…
…
{…}
1
1.2
{‘color’:’red’}
2
.003
{‘name’:’Fido’,’color’:’brown’}
…
…
…
{…}
Weights
The default key for cell and object weights is “weight”. The default value is 1. Weights may be assigned and/or a new default prescribed in the constructor using cell_weight_col and cell_weights for incidence pairs, and using edge_weight_prop, node_weight_prop, weight_prop, default_edge_weight, and default_node_weight for node and edge weights.
- adjacency_matrix(s=1, index=False, remove_empty_rows=False)[source]
The s-adjacency matrix for the hypergraph.
- Parameters:
s (int, optional, default = 1) –
index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns
remove_empty_rows (boolean, optional, default = False) –
- Returns:
adjacency_matrix (scipy.sparse.csr.csr_matrix)
node_index (list) – index of ids for rows and columns
- auxiliary_matrix(s=1, node=True, index=False)[source]
The unweighted s-edge or node auxiliary matrix for hypergraph
- Parameters:
s (int, optional, default = 1) –
node (bool, optional, default = True) – whether to return based on node or edge adjacencies
- Returns:
auxiliary_matrix (scipy.sparse.csr.csr_matrix) – Node/Edge adjacency matrix with empty rows and columns removed
index (np.array) – row and column index of userids
- bipartite()[source]
Constructs the networkX bipartite graph associated to hypergraph.
- Returns:
bipartite
- Return type:
nx.Graph()
Notes
Creates a bipartite networkx graph from hypergraph. The nodes and (hyper)edges of hypergraph become the nodes of bipartite graph. For every (hyper)edge e in the hypergraph and node n in e there is an edge (n,e) in the graph.
- collapse_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]
Constructs a new hypergraph gotten by identifying edges containing the same nodes
- Parameters:
name (hashable, optional, default = None) –
return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes
- Returns:
new hypergraph (Hypergraph) – Equivalent edges are collapsed to a single edge named by a representative of the equivalent edges followed by a colon and the number of edges it represents.
equivalence_classes (dict) – A dictionary keyed by representative edge names with values equal to the edges in its equivalence class
Notes
Two edges are identified if their respective elements are the same. Using this as an equivalence relation, the uids of the edges are partitioned into equivalence classes.
A single edge from the collapsed edges followed by a colon and the number of elements in its equivalence class as uid for the new edge
- collapse_nodes(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None) Hypergraph [source]
Constructs a new hypergraph gotten by identifying nodes contained by the same edges
- Parameters:
name (str, optional, default = None) –
return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of node equivalence classes keyed by frozen sets of edges
use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed nodes as uid for the new node, otherwise uses a frozen set of the uids of nodes in the equivalence class. If use_reps is True the new nodes have uids given by a tuple of the rep and the count
return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]
- Returns:
new hypergraph
- Return type:
Notes
Two nodes are identified if their respective memberships are the same. Using this as an equivalence relation, the uids of the nodes are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.
Example
>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')})) >>> h = Hypergraph(data) >>> h.collapse_nodes().incidence_dict {'E1': ['a: 2'], 'E2': ['a: 2']}
- collapse_nodes_and_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]
Returns a new hypergraph by collapsing nodes and edges.
- Parameters:
name (str, optional, default = None) –
return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes
use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed elements as a representative. If use_reps is True, the new elements are keyed by a tuple of the rep and the count.
return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]
- Returns:
new hypergraph
- Return type:
Notes
Collapses the Nodes and Edges of EntitySets. Two nodes(edges) are duplicates if their respective memberships(elements) are the same. Using this as an equivalence relation, the uids of the nodes(edges) are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.
Example
>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')} >>> h = Hypergraph(data) >>> h.incidence_dict {'E1': ['a', 'b'], 'E2': ['a', 'b']} >>> h.collapse_nodes_and_edges().incidence_dict {'E1: 2': ['a: 2']}
- component_subgraphs(return_singletons=False, name=None)[source]
Same as
s_components_subgraphs()
with s=1. Returns iterator.See also
- components(edges=False)[source]
Same as
s_connected_components()
with s=1, but nodes are returned by default. Return iterator.See also
- connected_component_subgraphs(return_singletons=True, name=None)[source]
Same as
s_component_subgraphs()
with s=1. Returns iteratorSee also
- connected_components(edges=False)[source]
Same as
s_connected_components()
with s=1, but nodes are returned by default. Return iterator.See also
- property dataframe
Returns dataframe of incidence pairs and their properties.
- Return type:
pd.DataFrame
- degree(node, s=1, max_size=None)[source]
The number of edges of size s that contain node.
- Parameters:
node (hashable) – identifier for the node.
s (positive integer, optional, default 1) – smallest size of edge to consider in degree
max_size (positive integer or None, optional, default = None) – largest size of edge to consider in degree
- Return type:
int
- diameter(s=1)[source]
Returns the length of the longest shortest s-walk between nodes in hypergraph
- Parameters:
s (int, optional, default 1) –
- Returns:
diameter
- Return type:
int
- Raises:
HyperNetXError – If hypergraph is not s-edge-connected
Notes
Two nodes are s-adjacent if they share s edges. Two nodes v_start and v_end are s-walk connected if there is a sequence of nodes v_start, v_1, v_2, … v_n-1, v_end such that consecutive nodes are s-adjacent. If the graph is not connected, an error will be raised.
- distance(source, target, s=1)[source]
Returns the shortest s-walk distance between two nodes in the hypergraph.
- Parameters:
source (node.uid or node) – a node in the hypergraph
target (node.uid or node) – a node in the hypergraph
s (positive integer) – the number of edges
- Returns:
s-walk distance
- Return type:
int
See also
Notes
The s-distance is the shortest s-walk length between the nodes. An s-walk between nodes is a sequence of nodes that pairwise share at least s edges. The length of the shortest s-walk is 1 less than the number of nodes in the path sequence.
Uses the networkx shortest_path_length method on the graph generated by the s-adjacency matrix.
- dual(name=None, switch_names=True)[source]
Constructs a new hypergraph with roles of edges and nodes of hypergraph reversed.
- Parameters:
name (hashable, optional) –
switch_names (bool, optional, default = True) – reverses edge_col and node_col names unless edge_col = ‘edges’ and node_col = ‘nodes’
- Return type:
hypergraph
- edge_adjacency_matrix(s=1, index=False)[source]
The s-adjacency matrix for the dual hypergraph.
- Parameters:
s (int, optional, default 1) –
index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns
- Returns:
edge_adjacency_matrix (scipy.sparse.csr.csr_matrix)
edge_index (list) – index of ids for rows and columns
Notes
This is also the adjacency matrix for the line graph. Two edges are s-adjacent if they share at least s nodes. If remove_zeros is True will return the auxillary matrix
- edge_diameter(s=1)[source]
Returns the length of the longest shortest s-walk between edges in hypergraph
- Parameters:
s (int, optional, default 1) –
- Returns:
edge_diameter
- Return type:
int
- Raises:
HyperNetXError – If hypergraph is not s-edge-connected
Notes
Two edges are s-adjacent if they share s nodes. Two nodes e_start and e_end are s-walk connected if there is a sequence of edges e_start, e_1, e_2, … e_n-1, e_end such that consecutive edges are s-adjacent. If the graph is not connected, an error will be raised.
- edge_diameters(s=1)[source]
Returns the edge diameters of the s_edge_connected component subgraphs in hypergraph.
- Parameters:
s (int, optional, default 1) –
- Returns:
maximum diameter (int)
list of diameters (list) – List of edge_diameters for s-edge component subgraphs in hypergraph
list of component (list) – List of the edge uids in the s-edge component subgraphs.
- edge_distance(source, target, s=1)[source]
XX TODO: still need to return path and translate into user defined nodes and edges Returns the shortest s-walk distance between two edges in the hypergraph.
- Parameters:
source (edge.uid or edge) – an edge in the hypergraph
target (edge.uid or edge) – an edge in the hypergraph
s (positive integer) – the number of intersections between pairwise consecutive edges
TODO (add edge weights) –
weight (None or string, optional, default = None) – if None then all edges have weight 1. If string then edge attribute string is used if available.
- Returns:
s- walk distance – A shortest s-walk is computed as a sequence of edges, the s-walk distance is the number of edges in the sequence minus 1. If no such path exists returns np.inf.
- Return type:
the shortest s-walk edge distance
See also
Notes
The s-distance is the shortest s-walk length between the edges. An s-walk between edges is a sequence of edges such that consecutive pairwise edges intersect in at least s nodes. The length of the shortest s-walk is 1 less than the number of edges in the path sequence.
Uses the networkx shortest_path_length method on the graph generated by the s-edge_adjacency matrix.
- edge_neighbors(edge, s=1)[source]
The edges in hypergraph which share s nodes(s) with edge.
- Parameters:
edge (hashable or Entity) – uid for a edge in hypergraph or the edge Entity
s (int, list, optional, default = 1) – Minimum number of nodes shared by neighbors edge node.
- Returns:
List of edge neighbors
- Return type:
list
- property edge_props
Dataframe of edge properties indexed on edge ids
- Return type:
pd.DataFrame
- classmethod from_bipartite(B, set_names=('edges', 'nodes'), name=None, **kwargs)[source]
Static method creates a Hypergraph from a bipartite graph.
- Parameters:
B (nx.Graph()) – A networkx bipartite graph. Each node in the graph has a property ‘bipartite’ taking the value of 0 or 1 indicating a 2-coloring of the graph.
set_names (iterable of length 2, optional, default = ['edges','nodes']) – Category names assigned to the graph nodes associated to each bipartite set
name (hashable, optional) –
- Return type:
Notes
A partition for the nodes in a bipartite graph generates a hypergraph.
>>> import networkx as nx >>> B = nx.Graph() >>> B.add_nodes_from([1, 2, 3, 4], bipartite=0) >>> B.add_nodes_from(['a', 'b', 'c'], bipartite=1) >>> B.add_edges_from([(1, 'a'), (1, 'b'), (2, 'b'), (2, 'c'), / (3, 'c'), (4, 'a')]) >>> H = Hypergraph.from_bipartite(B) >>> H.nodes, H.edges # output: (EntitySet(_:Nodes,[1, 2, 3, 4],{}), / # EntitySet(_:Edges,['b', 'c', 'a'],{}))
- classmethod from_incidence_dataframe(df, columns=None, rows=None, edge_col: str = 'edges', node_col: str = 'nodes', name=None, fillna=0, transpose=False, transforms=[], key=None, return_only_dataframe=False, **kwargs)[source]
Create a hypergraph from a Pandas Dataframe object, which has values equal to the incidence matrix of a hypergraph. Its index will identify the nodes and its columns will identify its edges.
- Parameters:
df (Pandas.Dataframe) – a real valued dataframe with a single index
columns ((optional) list, default = None) – restricts df to the columns with headers in this list.
rows ((optional) list, default = None) – restricts df to the rows indexed by the elements in this list.
name ((optional) string, default = None) –
fillna (float, default = 0) – a real value to place in empty cell, all-zero columns will not generate an edge.
transpose ((optional) bool, default = False) – option to transpose the dataframe, in this case df.Index will identify the edges and df.columns will identify the nodes, transpose is applied before transforms and key
transforms ((optional) list, default = []) – optional list of transformations to apply to each column, of the dataframe using pd.DataFrame.apply(). Transformations are applied in the order they are given (ex. abs). To apply transforms to rows or for additional functionality, consider transforming df using pandas.DataFrame methods prior to generating the hypergraph.
key ((optional) function, default = None) – boolean function to be applied to dataframe. will be applied to entire dataframe.
return_only_dataframe ((optional) bool, default = False) – to use the incidence_dataframe with cell_properties or properties, set this to true and use it as the setsystem in the Hypergraph constructor.
See also
- Return type:
- classmethod from_incidence_matrix(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]
Same as from_numpy_array.
- classmethod from_numpy_array(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]
Create a hypergraph from a real valued matrix represented as a 2 dimensionsl numpy array. The matrix is converted to a matrix of 0’s and 1’s so that any truthy cells are converted to 1’s and all others to 0’s.
- Parameters:
M (real valued array-like object, 2 dimensions) – representing a real valued matrix with rows corresponding to nodes and columns to edges
node_names (object, array-like, default=None) – List of node names must be the same length as M.shape[0]. If None then the node names correspond to row indices with ‘v’ prepended.
edge_names (object, array-like, default=None) – List of edge names must have the same length as M.shape[1]. If None then the edge names correspond to column indices with ‘e’ prepended.
name (hashable) –
key ((optional) function) – boolean function to be evaluated on each cell of the array, must be applicable to numpy.array
- Return type:
Note
The constructor does not generate empty edges. All zero columns in M are removed and the names corresponding to these edges are discarded.
- get_cell_properties(edge: str, node: str, prop_name: str | None = None) Any | dict[str, Any] [source]
Get cell properties on a specified edge and node
- Parameters:
edge (str) – edgeid
node (str) – nodeid
prop_name (str, optional) – name of a cell property; if None, all cell properties will be returned
- Returns:
cell property value if prop_name is provided, otherwise
dict
of all cell properties and values- Return type:
int or str or dict of {str: any}
- get_linegraph(s=1, edges=True)[source]
Creates an ::term::s-linegraph for the Hypergraph. If edges=True (default)then the edges will be the vertices of the line graph. Two vertices are connected by an s-line-graph edge if the corresponding hypergraph edges intersect in at least s hypergraph nodes. If edges=False, the hypergraph nodes will be the vertices of the line graph. Two vertices are connected if the nodes they correspond to share at least s incident hyper edges.
- Parameters:
s (int) – The width of the connections.
edges (bool, optional, default = True) – Determine if edges or nodes will be the vertices in the linegraph.
- Returns:
A NetworkX graph.
- Return type:
nx.Graph
- get_properties(id, level=None, prop_name=None)[source]
Returns an object’s specific property or all properties
- Parameters:
id (hashable) – edge or node id
level (int | None , optional, default = None) – if separate edge and node properties then enter 0 for edges and 1 for nodes.
prop_name (str | None, optional, default = None) – if None then all properties associated with the object will be returned.
- Returns:
single property or dictionary of properties
- Return type:
str or dict
- incidence_dataframe(sort_rows=False, sort_columns=False, cell_weights=True)[source]
Returns a pandas dataframe for hypergraph indexed by the nodes and with column headers given by the edge names.
- Parameters:
sort_rows (bool, optional, default =True) – sort rows based on hashable node names
sort_columns (bool, optional, default =True) – sort columns based on hashable edge names
cell_weights (bool, optional, default =True) –
- property incidence_dict
Dictionary keyed by edge uids with values the uids of nodes in each edge
- Return type:
dict
- incidence_matrix(weights=False, index=False)[source]
An incidence matrix for the hypergraph indexed by nodes x edges.
- Parameters:
weights (bool, default =False) – If False all nonzero entries are 1. If True and self.static all nonzero entries are filled by self.edges.cell_weights dictionary values.
index (boolean, optional, default = False) – If True return will include a dictionary of node uid : row number and edge uid : column number
- Returns:
incidence_matrix (scipy.sparse.csr.csr_matrix or np.ndarray)
row_index (list) – index of node ids for rows
col_index (list) – index of edge ids for columns
- is_connected(s=1, edges=False)[source]
Determines if hypergraph is s-connected.
- Parameters:
s (int, optional, default 1) –
edges (boolean, optional, default = False) – If True, will determine if s-edge-connected. For s=1 s-edge-connected is the same as s-connected.
- Returns:
is_connected
- Return type:
boolean
Notes
A hypergraph is s node connected if for any two nodes v0,vn there exists a sequence of nodes v0,v1,v2,…,v(n-1),vn such that every consecutive pair of nodes v(i),v(i+1) share at least s edges.
A hypergraph is s edge connected if for any two edges e0,en there exists a sequence of edges e0,e1,e2,…,e(n-1),en such that every consecutive pair of edges e(i),e(i+1) share at least s nodes.
- neighbors(node, s=1)[source]
The nodes in hypergraph which share s edge(s) with node.
- Parameters:
node (hashable or Entity) – uid for a node in hypergraph or the node Entity
s (int, list, optional, default = 1) – Minimum number of edges shared by neighbors with node.
- Returns:
neighbors – s-neighbors share at least s edges in the hypergraph
- Return type:
list
- node_diameters(s=1)[source]
Returns the node diameters of the connected components in hypergraph.
- Parameters:
and (list of the diameters of the s-components) –
nodes (list of the s-component) –
- property node_props
Dataframe of node properties indexed on node ids
- Return type:
pd.DataFrame
- number_of_edges(edgeset=None)[source]
The number of edges in edgeset belonging to hypergraph.
- Parameters:
edgeset (an iterable of Entities, optional, default = None) – If None, then return the number of edges in hypergraph.
- Returns:
number_of_edges
- Return type:
int
- number_of_nodes(nodeset=None)[source]
The number of nodes in nodeset belonging to hypergraph.
- Parameters:
nodeset (an interable of Entities, optional, default = None) – If None, then return the number of nodes in hypergraph.
- Returns:
number_of_nodes
- Return type:
int
- property properties
Returns dataframe of edge and node properties.
- Return type:
pd.DataFrame
- remove(keys, level=None, name=None)[source]
Creates a new hypergraph with nodes and/or edges indexed by keys removed. More efficient for creating a restricted hypergraph if the restricted set is greater than what is being removed.
- Parameters:
keys (list | tuple | set | Hashable) – node and/or edge id(s) to restrict to
level (None, optional) – Enter 0 to remove edges with ids in keys. Enter 1 to remove nodes with ids in keys. If None then all objects in nodes and edges with the id will be removed.
name (str, optional) – Name of new hypergraph
- Return type:
hnx.Hypergraph
- remove_singletons(name=None)[source]
Constructs clone of hypergraph with singleton edges removed.
- Returns:
new hypergraph
- Return type:
- restrict_to_edges(edges, name=None)[source]
New hypergraph gotten by restricting to edges
- Parameters:
edges (Iterable) – edgeids to restrict to
- Return type:
hnx.Hypergraph
- restrict_to_nodes(nodes, name=None)[source]
New hypergraph gotten by restricting to nodes
- Parameters:
nodes (Iterable) – nodeids to restrict to
- Return type:
hnx. Hypergraph
- s_component_subgraphs(s=1, edges=True, return_singletons=False, name=None)[source]
Returns a generator for the induced subgraphs of s_connected components. Removes singletons unless return_singletons is set to True. Computed using s-linegraph generated either by the hypergraph (edges=True) or its dual (edges = False)
- Parameters:
s (int, optional, default 1) –
edges (boolean, optional, edges=False) – Determines if edge or node components are desired. Returns subgraphs equal to the hypergraph restricted to each set of nodes(edges) in the s-connected components or s-edge-connected components
return_singletons (bool, optional) –
- Yields:
s_component_subgraphs (iterator) – Iterator returns subgraphs generated by the edges (or nodes) in the s-edge(node) components of hypergraph.
- s_components(s=1, edges=True, return_singletons=True)[source]
Same as s_connected_components
See also
- s_connected_components(s=1, edges=True, return_singletons=False)[source]
Returns a generator for the s-edge-connected components or the s-node-connected components of the hypergraph.
- Parameters:
s (int, optional, default 1) –
edges (boolean, optional, default = True) – If True will return edge components, if False will return node components
return_singletons (bool, optional, default = False) –
Notes
If edges=True, this method returns the s-edge-connected components as lists of lists of edge uids. An s-edge-component has the property that for any two edges e1 and e2 there is a sequence of edges starting with e1 and ending with e2 such that pairwise adjacent edges in the sequence intersect in at least s nodes. If s=1 these are the path components of the hypergraph.
If edges=False this method returns s-node-connected components. A list of sets of uids of the nodes which are s-walk connected. Two nodes v1 and v2 are s-walk-connected if there is a sequence of nodes starting with v1 and ending with v2 such that pairwise adjacent nodes in the sequence share s edges. If s=1 these are the path components of the hypergraph.
Example
>>> S = {'A':{1,2,3},'B':{2,3,4},'C':{5,6},'D':{6}} >>> H = Hypergraph(S)
>>> list(H.s_components(edges=True)) [{'C', 'D'}, {'A', 'B'}] >>> list(H.s_components(edges=False)) [{1, 2, 3, 4}, {5, 6}]
- Yields:
s_connected_components (iterator) – Iterator returns sets of uids of the edges (or nodes) in the s-edge(node) components of hypergraph.
- set_state(**kwargs)[source]
Allow state_dict updates from outside of class. Use with caution.
- Parameters:
**kwargs – key=value pairs to save in state dictionary
- property shape
(number of nodes, number of edges)
- Return type:
tuple
- singletons()[source]
Returns a list of singleton edges. A singleton edge is an edge of size 1 with a node of degree 1.
- Returns:
singles – A list of edge uids.
- Return type:
list
- size(edge, nodeset=None)[source]
The number of nodes in nodeset that belong to edge. If nodeset is None then returns the size of edge
- Parameters:
edge (hashable) – The uid of an edge in the hypergraph
- Returns:
size
- Return type:
int
- toplexes(name=None)[source]
Returns a simple hypergraph corresponding to self.
Warning
Collapsing is no longer supported inside the toplexes method. Instead generate a new collapsed hypergraph and compute the toplexes of the new hypergraph.
- Parameters:
name (str, optional, default = None) –
Module contents
- class classes.Entity(entity: DataFrame | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | None = None, data_cols: Sequence[T] = [0, 1], data: ndarray | None = None, static: bool = False, labels: OrderedDict[T, Sequence[T]] | None = None, uid: Hashable | None = None, weight_col: str | int | None = 'cell_weights', weights: Sequence[float] | float | int | str | None = 1, aggregateby: str | dict | None = 'sum', properties: DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', level_col: str = 'level', id_col: str = 'id')[source]
Bases:
object
Base class for handling N-dimensional data when building network-like models, i.e.,
Hypergraph
- Parameters:
entity (pandas.DataFrame, dict of lists or sets, list of lists or sets, optional) – If a
DataFrame
with N columns, represents N-dimensional entity data (data table). Otherwise, represents 2-dimensional entity data (system of sets). TODO: Test for compatibility with list of Entities and update docsdata (numpy.ndarray, optional) – 2D M x N
ndarray
ofints
(data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. Ignored if entity is provided.static (bool, default=True) – If
True
, entity data may not be altered, and thestate_dict
will never be cleared. Otherwise, rows may be added to and removed from the data table, and updates will clear thestate_dict
.labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to
ints
in data. Ignored if entity is provided or data is not provided.uid (hashable, optional) – A unique identifier for the object
weights (str or sequence of float, optional) –
User-specified cell weights corresponding to entity data. If sequence of
floats
and entity or data defines a data table,length must equal the number of rows.
- If sequence of
floats
and entity defines a system of sets, length must equal the total sum of the sizes of all sets.
- If
str
and entity is aDataFrame
, must be the name of a column in entity.
Otherwise, weight for all cells is assumed to be 1.
- If sequence of
aggregateby ({'sum', 'last', count', 'mean','median', max', 'min', 'first', None}) – Name of function to use for aggregating cell weights of duplicate rows when entity or data defines a data table, default is “sum”. If None, duplicate rows will be dropped without aggregating cell weights. Effectively ignored if entity defines a system of sets.
properties (pandas.DataFrame or doubly-nested dict, optional) – User-specified properties to be assigned to individual items in the data, i.e., cell entries in a data table; sets or set elements in a system of sets. See Notes for detailed explanation. If
DataFrame
, each row gives[optional item level, item label, optional named properties, {property name: property value}]
(order of columns does not matter; see note for an example). If doubly-nested dict,{item level: {item label: {property name: property value}}}
.misc_props_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in
properties
; see Notes for explanation.level_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in
properties
; see Notes for explanation.id_col (str, default="properties", "level, "id") – Column names for miscellaneous properties, level index, and item name in
properties
; see Notes for explanation.
Notes
A property is a named attribute assigned to a single item in the data.
You can pass a table of properties to properties as a
DataFrame
:Level (optional)
ID
[explicit property type]
[…]
misc. properties
0
level 0 item
property value
…
{property name: property value}
1
level 1 item
property value
…
{property name: property value}
…
…
…
…
…
N
level N item
property value
…
{property name: property value}
The Level column is optional. If not provided, properties will be assigned by ID (i.e., if an ID appears at multiple levels, the same properties will be assigned to all occurrences).
The names of the Level (if provided) and ID columns must be specified by level_col and id_col. misc_props_col can be used to specify the name of the column to be used for miscellaneous properties; if no column by that name is found, a new column will be created and populated with empty
dicts
. All other columns will be considered explicit property types. The order of the columns does not matter.This method assumes that there are no rows with the same (Level, ID); if duplicates are found, all but the first occurrence will be dropped.
- add(*args)[source]
Updates the underlying data table with new entity data from multiple sources
- Parameters:
*args – variable length argument list of Entity and/or representations of entity data
- Returns:
self
- Return type:
Warning
Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use
Hypergraph.add_edge
orHypergraph.add_node_to_edge
instead.
- add_element(data)[source]
Updates the underlying data table with new entity data
Supports adding from either an existing Entity or a representation of entity (data table or labeled system of sets are both supported representations)
- Parameters:
data (Entity, pandas.DataFrame, or dict of lists or sets) – new entity data
- Returns:
self
- Return type:
Warning
Adding an element directly to an Entity will not add the element to any Hypergraphs constructed from that Entity, and will cause an error. Use Hypergraph.add_edge or Hypergraph.add_node_to_edge instead.
See also
add
takes multiple sources of new entity data as variable length argument list
Hypergraph.add_edge
,Hypergraph.add_node_to_edge
- add_elements_from(arg_set)[source]
Adds arguments from an iterable to the data table one at a time
- ..deprecated:: 2.0.0
Duplicates add
- Parameters:
arg_set (iterable) – list of Entity and/or representations of entity data
- Returns:
self
- Return type:
- assign_properties(props: DataFrame | dict[int, dict[T, dict[Any, Any]]], misc_col: str | None = None, level_col=0, id_col=1) None [source]
Assign new properties to items in the data table, update
properties
- Parameters:
props (pandas.DataFrame or doubly-nested dict) – See documentation of the properties parameter in
Entity
level_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to
_level_col
,_id_col
,_misc_props_col
, respectively.id_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to
_level_col
,_id_col
,_misc_props_col
, respectively.misc_col (str, optional) – column names corresponding to the levels, items, and misc. properties; if None, default to
_level_col
,_id_col
,_misc_props_col
, respectively.
See also
- property cell_weights
Cell weights corresponding to each row of the underlying data table
- Returns:
dict of {tuple – Keyed by row of data table (as a tuple)
- Return type:
int or float}
- property children
Labels of all items in level 1 (second column) of the underlying data table
- Return type:
frozenset
- property data
Sparse representation of the data table as an incidence tensor
This can also be thought of as an encoding of dataframe, where items in each column of the data table are translated to their int position in the self.labels[column] list :returns: 2D array of ints representing rows of the underlying data table as indices in an incidence tensor :rtype: numpy.ndarray
- property dataframe
The underlying data table stored by the Entity
- Return type:
pandas.DataFrame
- property dimensions
Dimensions of data i.e., the number of distinct items in each level (column) of the underlying data table
- Returns:
Length and order corresponds to columns of self.dataframe (excluding cell weight column)
- Return type:
tuple of ints
- property dimsize
Number of levels (columns) in the underlying data table
- Returns:
Equal to length of self.dimensions
- Return type:
int
- property elements
System of sets representation of the first two levels (columns) of the underlying data table
Each item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table
- Returns:
System of sets representation as dict of {level 0 item : AttrList(level 1 items)}
- Return type:
dict of AttrList
See also
incidence_dict
same data as dict of list
memberships
dual of this representation, i.e., each item in level 1 (second column) defines a set
- elements_by_column(col1, col2)[source]
System of sets representation of two columns (levels) of the underlying data table
Each item in col1 defines a set containing all the col2 items with which it appears in the same row of the underlying data table
Properties can be accessed and assigned to items in col1
- Parameters:
col1 (Hashable) – name of column whose items define sets
col2 (Hashable) – name of column whose items are elements in the system of sets
- Returns:
System of sets representation as dict of {col1 item : AttrList(col2 items)}
- Return type:
dict of AttrList
See also
elements_by_level
same functionality, takes level indices instead of column names
- elements_by_level(level1, level2)[source]
System of sets representation of two levels (columns) of the underlying data table
Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table
Properties can be accessed and assigned to items in level1
- Parameters:
level1 (int) – index of level whose items define sets
level2 (int) – index of level whose items are elements in the system of sets
- Returns:
System of sets representation as dict of {level1 item : AttrList(level2 items)}
- Return type:
dict of AttrList
See also
elements_by_column
same functionality, takes column names instead of level indices
- property empty
Whether the underlying data table is empty or not
- Return type:
bool
- encode(data)[source]
Encode dataframe to numpy array
- Parameters:
data (dataframe) –
- Return type:
numpy.array
- get_properties(item: T, level: int | None = None) dict[Any, Any] [source]
Get all properties of an item
- Parameters:
item (hashable) – name of an item
level (int, optional) – level index of the item
- Returns:
prop_vals –
{named property: property value, ..., misc. property column name: {property name: property value}}
- Return type:
dict
- Raises:
KeyError – if (level, item) is not in
properties
, or if level is not provided and item is not inproperties
- Warns:
UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)
See also
- get_property(item: T, prop_name: Any, level: int | None = None) Any [source]
Get a property of an item
- Parameters:
item (hashable) – name of an item
prop_name (hashable) – name of the property to get
level (int, optional) – level index of the item
- Returns:
prop_val – value of the property
- Return type:
any
- Raises:
KeyError – if (level, item) is not in
properties
, or if level is not provided and item is not inproperties
- Warns:
UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)
See also
- property incidence_dict: dict[T, list[T]]
System of sets representation of the first two levels (columns) of the underlying data table
- Returns:
System of sets representation as dict of {level 0 item : AttrList(level 1 items)}
- Return type:
dict of list
See also
elements
same data as dict of AttrList
- incidence_matrix(level1=0, level2=1, weights=False, aggregateby=None, index=False) csr_matrix | None [source]
Incidence matrix representation for two levels (columns) of the underlying data table
If level1 and level2 contain N and M distinct items, respectively, the incidence matrix will be M x N. In other words, the items in level1 and level2 correspond to the columns and rows of the incidence matrix, respectively, in the order in which they appear in self.labels[column1] and self.labels[column2] (column1 and column2 are the column labels of level1 and level2)
- Parameters:
level1 (int, default=0) – index of first level (column)
level2 (int, default=1) – index of second level
weights (bool or dict, default=False) – If False all nonzero entries are 1. If True all nonzero entries are filled by self.cell_weight dictionary values, use
aggregateby
to specify how duplicate entries should have weights aggregated. If dict of {(level1 item, level2 item): weight value} form; only nonzero cells in the incidence matrix will be updated by dictionary, i.e., level1 item and level2 item must appear in the same row at least once in the underlying data tableaggregateby ({'last', count', 'sum', 'mean','median', max', 'min', 'first', 'last', None}, default='count') –
- Method to aggregate weights of duplicate rows in data table.
If None, then all cell weights will be set to 1.
index (bool, optional) – Not used
- Returns:
sparse representation of incidence matrix (i.e. Compressed Sparse Row matrix)
- Return type:
scipy.sparse.csr.csr_matrix
Note
In the context of Hypergraphs, think level1 = edges, level2 = nodes
- index(column, value=None)[source]
Get level index corresponding to a column and (optionally) the index of a value in that column
The index of
value
is its position in the list given byself.labels[column]
, which is used in the integer encoding of the data tableself.data
- Parameters:
column (str) – name of a column in self.dataframe
value (str, optional) – label of an item in the specified column
- Returns:
level index corresponding to column, index of value if provided
- Return type:
int or (int, int)
- indices(column, values)[source]
Get indices of one or more value(s) in a column
- Parameters:
column (str) –
values (str or iterable of str) –
- Returns:
indices of values
- Return type:
list of int
See also
index
for finding level index of a column and index of a single value
- is_empty(level=0)[source]
Whether a specified level (column) of the underlying data table is empty or not
- Return type:
bool
- property isstatic
Whether to treat the underlying data as static or not
If True, the underlying data may not be altered, and the state_dict will never be cleared Otherwise, rows may be added to and removed from the data table, and updates will clear the state_dict
- Return type:
bool
- property labels
Labels of all items in each column of the underlying data table
- Returns:
dict of {column name: [item labels]} The order of [item labels] corresponds to the int encoding of each item in self.data.
- Return type:
dict of lists
- level(item, min_level=0, max_level=None, return_index=True)[source]
First level containing the given item label
Order of levels corresponds to order of columns in self.dataframe
- Parameters:
item (str) –
min_level (int, optional) – inclusive bounds on range of levels to search for item
max_level (int, optional) – inclusive bounds on range of levels to search for item
return_index (bool, default=True) – If True, return index of item within the level
- Returns:
index of first level containing the item, index of item if return_index=True returns None if item is not found
- Return type:
int, (int, int), or None
- property memberships
System of sets representation of the first two levels (columns) of the underlying data table
Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table
- Returns:
System of sets representation as dict of {level 1 item : AttrList(level 0 items)}
- Return type:
dict of AttrList
See also
elements
dual of this representation i.e., each item in level 0 (first column) defines a set
- property properties: DataFrame
Properties assigned to items in the underlying data table
- Return type:
pandas.DataFrame
- remove(*args)[source]
Removes all rows containing specified item(s) from the underlying data table
- Parameters:
*args – variable length argument list of item labels
- Returns:
self
- Return type:
See also
remove_element
remove all rows containing a single specified item
- remove_element(item)[source]
Removes all rows containing a specified item from the underlying data table
- Parameters:
item – item label
- Returns:
self
- Return type:
See also
remove
same functionality, accepts variable length argument list of item labels
- remove_elements_from(arg_set)[source]
Removes all rows containing specified item(s) from the underlying data table
- ..deprecated: 2.0.0
Duplicates remove
- Parameters:
arg_set (iterable) – list of item labels
- Returns:
self
- Return type:
- restrict_to_indices(indices, level=0, **kwargs)[source]
Create a new Entity by restricting the data table to rows containing specific items in a given level
- Parameters:
indices (int or iterable of int) – indices of item label(s) in level to restrict to
level (int, default=0) – level index
**kwargs – Extra arguments to Entity constructor
- Return type:
- restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', **kwargs) Entity [source]
Create a new Entity by restricting to a subset of levels (columns) in the underlying data table
- Parameters:
levels (array-like of int) – indices of a subset of levels (columns) of data
weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights Otherwise, all new cell weights will be 1
aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1
**kwargs – Extra arguments to Entity constructor
- Return type:
- Raises:
KeyError – If levels contains any invalid values
See also
- set_property(item: T, prop_name: Any, prop_val: Any, level: int | None = None) None [source]
Set a property of an item
- Parameters:
item (hashable) – name of an item
prop_name (hashable) – name of the property to set
prop_val (any) – value of the property to set
level (int, optional) – level index of the item; required if item is not already in
properties
- Raises:
ValueError – If level is not provided and item is not in
properties
- Warns:
UserWarning – If level is not provided and item appears in multiple levels, assumes the first (closest to 0)
See also
- size(level=0)[source]
The number of items in a level of the underlying data table
Equivalent to
self.dimensions[level]
- Parameters:
level (int, default=0) –
- Return type:
int
See also
- translate(level, index)[source]
Given indices of a level and value(s), return the corresponding value label(s)
- Parameters:
level (int) – level index
index (int or list of int) – value index or indices
- Returns:
label(s) corresponding to value index or indices
- Return type:
str or list of str
See also
translate_arr
translate a full row of value indices across all levels (columns)
- translate_arr(coords)[source]
Translate a full encoded row of the data table e.g., a row of
self.data
- Parameters:
coords (tuple of ints) – encoded value indices, with one value index for each level of the data
- Returns:
full row of translated value labels
- Return type:
list of str
- property uid
User-defined unique identifier for the Entity
- Return type:
hashable
- property uidset
Labels of all items in level 0 (first column) of the underlying data table
- Return type:
frozenset
- uidset_by_column(column)[source]
Labels of all items in a particular column (level) of the underlying data table
- Parameters:
column (Hashable) – Name of a column in self.dataframe
- Return type:
frozenset
See also
uidset
Labels of all items in level 0 (first column)
children
Labels of all items in level 1 (second column)
uidset_by_level
Same functionality, takes the level index instead of column name
- uidset_by_level(level)[source]
Labels of all items in a particular level (column) of the underlying data table
- Parameters:
level (int) –
- Return type:
frozenset
See also
uidset
Labels of all items in level 0 (first column)
children
Labels of all items in level 1 (second column)
uidset_by_column
Same functionality, takes the column name instead of level index
- class classes.EntitySet(entity: pd.DataFrame | np.ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[T, Any]]] | None = None, data: np.ndarray | None = None, labels: OrderedDict[T, Sequence[T]] | None = None, level1: str | int = 0, level2: str | int = 1, weight_col: str | int = 'cell_weights', weights: Sequence[float] | float | int | str = 1, cell_properties: Sequence[T] | pd.DataFrame | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_cell_props_col: str = 'cell_properties', uid: Hashable | None = None, aggregateby: str | None = 'sum', properties: pd.DataFrame | dict[int, dict[T, dict[Any, Any]]] | None = None, misc_props_col: str = 'properties', **kwargs)[source]
Bases:
Entity
Class for handling 2-dimensional (i.e., system of sets, bipartite) data when building network-like models, i.e.,
Hypergraph
- Parameters:
entity (Entity, pandas.DataFrame, dict of lists or sets, or list of lists or sets, optional) – If an
Entity
with N levels or aDataFrame
with N columns, represents N-dimensional entity data (data table). If N > 2, only considers levels (columns) level1 and level2. Otherwise, represents 2-dimensional entity data (system of sets).data (numpy.ndarray, optional) – 2D M x N
ndarray
ofints
(data table); sparse representation of an N-dimensional incidence tensor with M nonzero cells. If N > 2, only considers levels (columns) level1 and level2. Ignored if entity is provided.labels (collections.OrderedDict of lists, optional) – User-specified labels in corresponding order to
ints
in data. For M x N data, N > 2, labels must contain either 2 or N keys. If N keys, only considers labels for levels (columns) level1 and level2. Ignored if entity is provided or data is not provided.level1 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If
int
, gives the index of a level; ifstr
, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).level2 (str or int, default=0,1) – Each item in level1 defines a set containing all the level2 items with which it appears in the same row of the underlying data table. If
int
, gives the index of a level; ifstr
, gives the name of a column in entity. Ignored if entity, data (if entity not provided), and labels all (if provided) represent 1- or 2-dimensional data (set or system of sets).weights (str or sequence of float, optional) –
User-specified cell weights corresponding to entity data. If sequence of
floats
and entity or data defines a data table,length must equal the number of rows.
- If sequence of
floats
and entity defines a system of sets, length must equal the total sum of the sizes of all sets.
- If
str
and entity is aDataFrame
, must be the name of a column in entity.
Otherwise, weight for all cells is assumed to be 1. Ignored if entity is an
Entity
and `keep_weights`=True.- If sequence of
keep_weights (bool, default=True) – Whether to preserve any existing cell weights; ignored if entity is not an
Entity
.cell_properties (str, list of str, pandas.DataFrame, or doubly-nested dict, optional) – User-specified properties to be assigned to cells of the incidence matrix, i.e., rows in a data table; pairs of (set, element of set) in a system of sets. See Notes for detailed explanation. Ignored if underlying data is 1-dimensional (set). If doubly-nested dict,
{level1 item: {level2 item: {cell property name: cell property value}}}
.misc_cell_props_col (str, default='cell_properties') – Column name for miscellaneous cell properties; see Notes for explanation.
kwargs – Keyword arguments passed to the
Entity
constructor, e.g., static, uid, aggregateby, properties, etc. SeeEntity
for documentation of these parameters.
Notes
A cell property is a named attribute assigned jointly to a set and one of its elements, i.e, a cell of the incidence matrix.
When an
Entity
orDataFrame
is passed to the entity parameter of the constructor, it should represent a data table:Column_1
Column_2
Column_3
[…]
Column_N
level 1 item
level 2 item
level 3 item
…
level N item
…
…
…
…
…
Assuming the default values for parameters level1, level2, the data table will be restricted to the set system defined by Column 1 and Column 2. Since each row of the data table represents an incidence or cell, values from other columns may contain data that should be converted to cell properties.
By passing a column name or list of column names as cell_properties, each given column will be preserved in the
cell_properties
as an explicit cell property type. An additional column incell_properties
will be created to store adict
of miscellaneous cell properties, which will store cell properties of types that have not been explicitly defined and do not have a dedicated column (which may be assigned after construction). The name of the miscellaneous column is determined by misc_cell_props_col.You can also pass a pre-constructed table to cell_properties as a
DataFrame
:Column_1
Column_2
[explicit cell prop. type]
[…]
misc. cell properties
level 1 item
level 2 item
cell property value
…
{cell property name: cell property value}
…
…
…
…
…
Column 1 and Column 2 must have the same names as the corresponding columns in the entity data table, and misc_cell_props_col can be used to specify the name of the column to be used for miscellaneous cell properties. If no column by that name is found, a new column will be created and populated with empty
dicts
. All other columns will be considered explicit cell property types. The order of the columns does not matter.Both of these methods assume that there are no row duplicates in the tables passed to entity and/or cell_properties; if duplicates are found, all but the first occurrence will be dropped.
- assign_cell_properties(cell_props: DataFrame | dict[T, dict[T, dict[Any, Any]]], misc_col: str | None = None, replace: bool = False) None [source]
Assign new properties to cells of the incidence matrix and update
properties
- Parameters:
cell_props (pandas.DataFrame, dict of iterables, or doubly-nested dict, optional) – See documentation of the cell_properties parameter in
EntitySet
misc_col (str, optional) – name of column to be used for miscellaneous cell property dicts
replace (bool, default=False) – If True, replace existing
cell_properties
with result; otherwise update with new values from result
- Raises:
AttributeError – Not supported for :attr:`dimsize`=1
- property cell_properties: DataFrame | None
Properties assigned to cells of the incidence matrix
- Returns:
Returns None if
dimsize
< 2- Return type:
pandas.Series, optional
- collapse_identical_elements(return_equivalence_classes: bool = False, **kwargs) EntitySet | tuple[hypernetx.classes.entityset.EntitySet, dict[str, list[str]]] [source]
Create a new
EntitySet
by collapsing sets with the same set elementsEach item in level 0 (first column) defines a set containing all the level 1 (second column) items with which it appears in the same row of the underlying data table.
- Parameters:
return_equivalence_classes (bool, default=False) – If True, return a dictionary of equivalence classes keyed by new edge names
**kwargs – Extra arguments to
EntitySet
constructor
- Returns:
new_entity (EntitySet) – new
EntitySet
with identical sets collapsed; if all sets are unique, the system of sets will be the same as the original.equivalence_classes (dict of lists, optional) – if return_equivalence_classes`=True, ``{collapsed set label: [level 0 item labels]}`.
- get_cell_properties(item1: T, item2: T) dict[Any, Any] [source]
Get all properties of a cell, i.e., incidence between items of different levels
- Parameters:
item1 (hashable) – name of an item in level 0
item2 (hashable) – name of an item in level 1
- Returns:
{named cell property: cell property value, ..., misc. cell property column name: {cell property name: cell property value}}
- Return type:
dict
See also
- get_cell_property(item1: T, item2: T, prop_name: Any) Any [source]
Get a property of a cell i.e., incidence between items of different levels
- Parameters:
item1 (hashable) – name of an item in level 0
item2 (hashable) – name of an item in level 1
prop_name (hashable) – name of the cell property to get
- Returns:
prop_val – value of the cell property
- Return type:
any
See also
- property memberships: dict[str, hypernetx.classes.helpers.AttrList[str]]
Extends
Entity.memberships
Each item in level 1 (second column) defines a set containing all the level 0 (first column) items with which it appears in the same row of the underlying data table.
- Returns:
System of sets representation as dict of
{level 1 item: AttrList(level 0 items)}
.- Return type:
dict of AttrList
See also
elements
dual of this representation, i.e., each item in level 0 (first column) defines a set
restrict_to_levels
for more information on how memberships work for 1-dimensional (set) data
- restrict_to(indices: int | Iterable[int], **kwargs) EntitySet [source]
Alias of
restrict_to_indices()
with default parameter `level`=0- Parameters:
indices (array_like of int) – indices of item label(s) in level to restrict to
**kwargs – Extra arguments to
EntitySet
constructor
- Return type:
See also
restrict_to_indices
- restrict_to_levels(levels: int | Iterable[int], weights: bool = False, aggregateby: str | None = 'sum', keep_memberships: bool = True, **kwargs) EntitySet [source]
Extends
Entity.restrict_to_levels()
- Parameters:
levels (array-like of int) – indices of a subset of levels (columns) of data
weights (bool, default=False) – If True, aggregate existing cell weights to get new cell weights. Otherwise, all new cell weights will be 1.
aggregateby ({'sum', 'first', 'last', 'count', 'mean', 'median', 'max', 'min', None}, optional) – Method to aggregate weights of duplicate rows in data table If None or `weights`=False then all new cell weights will be 1
keep_memberships (bool, default=True) – Whether to preserve membership information for the discarded level when the new
EntitySet
is restricted to a single level**kwargs – Extra arguments to
EntitySet
constructor
- Return type:
- Raises:
KeyError – If levels contains any invalid values
- set_cell_property(item1: T, item2: T, prop_name: Any, prop_val: Any) None [source]
Set a property of a cell i.e., incidence between items of different levels
- Parameters:
item1 (hashable) – name of an item in level 0
item2 (hashable) – name of an item in level 1
prop_name (hashable) – name of the cell property to set
prop_val (any) – value of the cell property to set
See also
- class classes.Hypergraph(setsystem: DataFrame | ndarray | Mapping[T, Iterable[T]] | Iterable[Iterable[T]] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, edge_col: str | int = 0, node_col: str | int = 1, cell_weight_col: str | int | None = 'cell_weights', cell_weights: Sequence[float] | float = 1.0, cell_properties: Sequence[str | int] | Mapping[T, Mapping[T, Mapping[str, Any]]] | None = None, misc_cell_properties_col: str | int | None = None, aggregateby: str | dict[str, str] = 'first', edge_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, node_properties: DataFrame | dict[T, dict[Any, Any]] | None = None, properties: DataFrame | dict[T, dict[Any, Any]] | dict[T, dict[T, dict[Any, Any]]] | None = None, misc_properties_col: str | int | None = None, edge_weight_prop_col: str | int = 'weight', node_weight_prop_col: str | int = 'weight', weight_prop_col: str | int = 'weight', default_edge_weight: float | None = None, default_node_weight: float | None = None, default_weight: float = 1.0, name: str | None = None, **kwargs)[source]
Bases:
object
- Parameters:
setsystem ((optional) dict of iterables, dict of dicts,iterable of iterables,) – pandas.DataFrame, numpy.ndarray, default = None See SetSystem above for additional setsystem requirements.
edge_col ((optional) str | int, default = 0) – column index (or name) in pandas.dataframe or numpy.ndarray, used for (hyper)edge ids. Will be used to reference edgeids for all set systems.
node_col ((optional) str | int, default = 1) – column index (or name) in pandas.dataframe or numpy.ndarray, used for node ids. Will be used to reference nodeids for all set systems.
cell_weight_col ((optional) str | int, default = None) – column index (or name) in pandas.dataframe or numpy.ndarray used for referencing cell weights. For a dict of dicts references key in cell property dicts.
cell_weights ((optional) Sequence[float,int] | int | float , default = 1.0) – User specified cell_weights or default cell weight. Sequential values are only used if setsystem is a dataframe or ndarray in which case the sequence must have the same length and order as these objects. Sequential values are ignored for dataframes if cell_weight_col is already a column in the data frame. If cell_weights is assigned a single value then it will be used as default for missing values or when no cell_weight_col is given.
cell_properties ((optional) Sequence[int | str] | Mapping[T,Mapping[T,Mapping[str,Any]]],) – default = None Column names from pd.DataFrame to use as cell properties or a dict assigning cell_property to incidence pairs of edges and nodes. Will generate a misc_cell_properties, which may have variable lengths per cell.
misc_cell_properties ((optional) str | int, default = None) – Column name of dataframe corresponding to a column of variable length property dictionaries for the cell. Ignored for other setsystem types.
aggregateby ((optional) str, dict, default = 'first') – By default duplicate edge,node incidences will be dropped unless specified with aggregateby. See pandas.DataFrame.agg() methods for additional syntax and usage information.
edge_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with edge ids. First column of dataframe or keys of dict link to edge ids in setsystem.
node_properties ((optional) pd.DataFrame | dict, default = None) – Properties associated with node ids. First column of dataframe or keys of dict link to node ids in setsystem.
properties ((optional) pd.DataFrame | dict, default = None) – Concatenation/union of edge_properties and node_properties. By default, the object id is used and should be the first column of the dataframe, or key in the dict. If there are nodes and edges with the same ids and different properties then use the edge_properties and node_properties keywords.
misc_properties ((optional) int | str, default = None) – Column of property dataframes with dtype=dict. Intended for variable length property dictionaries for the objects.
edge_weight_prop ((optional) str, default = None,) – Name of property in edge_properties to use for weight.
node_weight_prop ((optional) str, default = None,) – Name of property in node_properties to use for weight.
weight_prop ((optional) str, default = None) – Name of property in properties to use for ‘weight’
default_edge_weight ((optional) int | float, default = 1) – Used when edge weight property is missing or undefined.
default_node_weight ((optional) int | float, default = 1) – Used when node weight property is missing or undefined
name ((optional) str, default = None) – Name assigned to hypergraph
Hypergraphs in HNX 2.0
An hnx.Hypergraph H = (V,E) references a pair of disjoint sets: V = nodes (vertices) and E = (hyper)edges.
HNX allows for multi-edges by distinguishing edges by their identifiers instead of their contents. For example, if V = {1,2,3} and E = {e1,e2,e3}, where e1 = {1,2}, e2 = {1,2}, and e3 = {1,2,3}, the edges e1 and e2 contain the same set of nodes and yet are distinct and are distinguishable within H = (V,E).
New as of version 2.0, HNX provides methods to easily store and access additional metadata such as cell, edge, and node weights. Metadata associated with (edge,node) incidences are referenced as cell_properties. Metadata associated with a single edge or node is referenced as its properties.
The fundamental object needed to create a hypergraph is a setsystem. The setsystem defines the many-to-many relationships between edges and nodes in the hypergraph. Cell properties for the incidence pairs can be defined within the setsystem or in a separate pandas.Dataframe or dict. Edge and node properties are defined with a pandas.DataFrame or dict.
SetSystems
There are five types of setsystems currently accepted by the library.
iterable of iterables : Barebones hypergraph uses Pandas default indexing to generate hyperedge ids. Elements must be hashable.:
>>> H = Hypergraph([{1,2},{1,2},{1,2,3}])
dictionary of iterables : the most basic way to express many-to-many relationships providing edge ids. The elements of the iterables must be hashable):
>>> H = Hypergraph({'e1':[1,2],'e2':[1,2],'e3':[1,2,3]})
dictionary of dictionaries : allows cell properties to be assigned to a specific (edge, node) incidence. This is particularly useful when there are variable length dictionaries assigned to each pair:
>>> d = {'e1':{ 1: {'w':0.5, 'name': 'related_to'}, >>> 2: {'w':0.1, 'name': 'related_to', >>> 'startdate': '05.13.2020'}}, >>> 'e2':{ 1: {'w':0.52, 'name': 'owned_by'}, >>> 2: {'w':0.2}}, >>> 'e3':{ 1: {'w':0.5, 'name': 'related_to'}, >>> 2: {'w':0.2, 'name': 'owner_of'}, >>> 3: {'w':1, 'type': 'relationship'}}
>>> H = Hypergraph(d, cell_weight_col='w')
pandas.DataFrame For large datasets and for datasets with cell properties it is most efficient to construct a hypergraph directly from a pandas.DataFrame. Incidence pairs are in the first two columns. Cell properties shared by all incidence pairs can be placed in their own column of the dataframe. Variable length dictionaries of cell properties particular to only some of the incidence pairs may be placed in a single column of the dataframe. Representing the data above as a dataframe df:
col1
col2
w
col3
e1
1
0.5
{‘name’:’related_to’}
e1
2
0.1
- {“name”:”related_to”,
“startdate”:”05.13.2020”}
e2
1
0.52
{“name”:”owned_by”}
e2
2
0.2
…
…
…
{…}
The first row of the dataframe is used to reference each column.
>>> H = Hypergraph(df,edge_col="col1",node_col="col2", >>> cell_weight_col="w",misc_cell_properties="col3")
numpy.ndarray For homogeneous datasets given in an ndarray a pandas dataframe is generated and column names are added from the edge_col and node_col arguments. Cell properties containing multiple data types are added with a separate dataframe or dict and passed through the cell_properties keyword.
>>> arr = np.array([['e1','1'],['e1','2'], >>> ['e2','1'],['e2','2'], >>> ['e3','1'],['e3','2'],['e3','3']]) >>> H = hnx.Hypergraph(arr, column_names=['col1','col2'])
Edge and Node Properties
Properties specific to a single edge or node are passed through the keywords: edge_properties, node_properties, properties. Properties may be passed as dataframes or dicts. The first column or index of the dataframe or keys of the dict keys correspond to the edge and/or node identifiers. If identifiers are shared among edges and nodes, or are distinct for edges and nodes, properties may be combined into a single object and passed to the properties keyword. For example:
id
weight
properties
e1
5.0
{‘type’:’event’}
e2
0.52
{“name”:”owned_by”}
…
…
{…}
1
1.2
{‘color’:’red’}
2
.003
{‘name’:’Fido’,’color’:’brown’}
3
1.0
{}
A properties dictionary should have the format:
dp = {id1 : {prop1:val1, prop2,val2,...}, id2 : ... }
A properties dataframe may be used for nodes and edges sharing ids but differing in cell properties by adding a level index using 0 for edges and 1 for nodes:
level
id
weight
properties
0
e1
5.0
{‘type’:’event’}
0
e2
0.52
{“name”:”owned_by”}
…
…
…
{…}
1
1.2
{‘color’:’red’}
2
.003
{‘name’:’Fido’,’color’:’brown’}
…
…
…
{…}
Weights
The default key for cell and object weights is “weight”. The default value is 1. Weights may be assigned and/or a new default prescribed in the constructor using cell_weight_col and cell_weights for incidence pairs, and using edge_weight_prop, node_weight_prop, weight_prop, default_edge_weight, and default_node_weight for node and edge weights.
- adjacency_matrix(s=1, index=False, remove_empty_rows=False)[source]
The s-adjacency matrix for the hypergraph.
- Parameters:
s (int, optional, default = 1) –
index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns
remove_empty_rows (boolean, optional, default = False) –
- Returns:
adjacency_matrix (scipy.sparse.csr.csr_matrix)
node_index (list) – index of ids for rows and columns
- auxiliary_matrix(s=1, node=True, index=False)[source]
The unweighted s-edge or node auxiliary matrix for hypergraph
- Parameters:
s (int, optional, default = 1) –
node (bool, optional, default = True) – whether to return based on node or edge adjacencies
- Returns:
auxiliary_matrix (scipy.sparse.csr.csr_matrix) – Node/Edge adjacency matrix with empty rows and columns removed
index (np.array) – row and column index of userids
- bipartite()[source]
Constructs the networkX bipartite graph associated to hypergraph.
- Returns:
bipartite
- Return type:
nx.Graph()
Notes
Creates a bipartite networkx graph from hypergraph. The nodes and (hyper)edges of hypergraph become the nodes of bipartite graph. For every (hyper)edge e in the hypergraph and node n in e there is an edge (n,e) in the graph.
- collapse_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]
Constructs a new hypergraph gotten by identifying edges containing the same nodes
- Parameters:
name (hashable, optional, default = None) –
return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes
- Returns:
new hypergraph (Hypergraph) – Equivalent edges are collapsed to a single edge named by a representative of the equivalent edges followed by a colon and the number of edges it represents.
equivalence_classes (dict) – A dictionary keyed by representative edge names with values equal to the edges in its equivalence class
Notes
Two edges are identified if their respective elements are the same. Using this as an equivalence relation, the uids of the edges are partitioned into equivalence classes.
A single edge from the collapsed edges followed by a colon and the number of elements in its equivalence class as uid for the new edge
- collapse_nodes(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None) Hypergraph [source]
Constructs a new hypergraph gotten by identifying nodes contained by the same edges
- Parameters:
name (str, optional, default = None) –
return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of node equivalence classes keyed by frozen sets of edges
use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed nodes as uid for the new node, otherwise uses a frozen set of the uids of nodes in the equivalence class. If use_reps is True the new nodes have uids given by a tuple of the rep and the count
return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]
- Returns:
new hypergraph
- Return type:
Notes
Two nodes are identified if their respective memberships are the same. Using this as an equivalence relation, the uids of the nodes are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.
Example
>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')})) >>> h = Hypergraph(data) >>> h.collapse_nodes().incidence_dict {'E1': ['a: 2'], 'E2': ['a: 2']}
- collapse_nodes_and_edges(name=None, return_equivalence_classes=False, use_reps=None, return_counts=None)[source]
Returns a new hypergraph by collapsing nodes and edges.
- Parameters:
name (str, optional, default = None) –
return_equivalence_classes (boolean, optional, default = False) – Returns a dictionary of edge equivalence classes keyed by frozen sets of nodes
use_reps (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE] Choose a single element from the collapsed elements as a representative. If use_reps is True, the new elements are keyed by a tuple of the rep and the count.
return_counts (boolean, optional, default = None) – [DEPRECATED; WILL BE REMOVED IN NEXT RELEASE]
- Returns:
new hypergraph
- Return type:
Notes
Collapses the Nodes and Edges of EntitySets. Two nodes(edges) are duplicates if their respective memberships(elements) are the same. Using this as an equivalence relation, the uids of the nodes(edges) are partitioned into equivalence classes. A single member of the equivalence class is chosen to represent the class followed by the number of members of the class.
Example
>>> data = {'E1': ('a', 'b'), 'E2': ('a', 'b')} >>> h = Hypergraph(data) >>> h.incidence_dict {'E1': ['a', 'b'], 'E2': ['a', 'b']} >>> h.collapse_nodes_and_edges().incidence_dict {'E1: 2': ['a: 2']}
- component_subgraphs(return_singletons=False, name=None)[source]
Same as
s_components_subgraphs()
with s=1. Returns iterator.See also
- components(edges=False)[source]
Same as
s_connected_components()
with s=1, but nodes are returned by default. Return iterator.See also
- connected_component_subgraphs(return_singletons=True, name=None)[source]
Same as
s_component_subgraphs()
with s=1. Returns iteratorSee also
- connected_components(edges=False)[source]
Same as
s_connected_components()
with s=1, but nodes are returned by default. Return iterator.See also
- property dataframe
Returns dataframe of incidence pairs and their properties.
- Return type:
pd.DataFrame
- degree(node, s=1, max_size=None)[source]
The number of edges of size s that contain node.
- Parameters:
node (hashable) – identifier for the node.
s (positive integer, optional, default 1) – smallest size of edge to consider in degree
max_size (positive integer or None, optional, default = None) – largest size of edge to consider in degree
- Return type:
int
- diameter(s=1)[source]
Returns the length of the longest shortest s-walk between nodes in hypergraph
- Parameters:
s (int, optional, default 1) –
- Returns:
diameter
- Return type:
int
- Raises:
HyperNetXError – If hypergraph is not s-edge-connected
Notes
Two nodes are s-adjacent if they share s edges. Two nodes v_start and v_end are s-walk connected if there is a sequence of nodes v_start, v_1, v_2, … v_n-1, v_end such that consecutive nodes are s-adjacent. If the graph is not connected, an error will be raised.
- distance(source, target, s=1)[source]
Returns the shortest s-walk distance between two nodes in the hypergraph.
- Parameters:
source (node.uid or node) – a node in the hypergraph
target (node.uid or node) – a node in the hypergraph
s (positive integer) – the number of edges
- Returns:
s-walk distance
- Return type:
int
See also
Notes
The s-distance is the shortest s-walk length between the nodes. An s-walk between nodes is a sequence of nodes that pairwise share at least s edges. The length of the shortest s-walk is 1 less than the number of nodes in the path sequence.
Uses the networkx shortest_path_length method on the graph generated by the s-adjacency matrix.
- dual(name=None, switch_names=True)[source]
Constructs a new hypergraph with roles of edges and nodes of hypergraph reversed.
- Parameters:
name (hashable, optional) –
switch_names (bool, optional, default = True) – reverses edge_col and node_col names unless edge_col = ‘edges’ and node_col = ‘nodes’
- Return type:
hypergraph
- edge_adjacency_matrix(s=1, index=False)[source]
The s-adjacency matrix for the dual hypergraph.
- Parameters:
s (int, optional, default 1) –
index (boolean, optional, default = False) – if True, will return the index of ids for rows and columns
- Returns:
edge_adjacency_matrix (scipy.sparse.csr.csr_matrix)
edge_index (list) – index of ids for rows and columns
Notes
This is also the adjacency matrix for the line graph. Two edges are s-adjacent if they share at least s nodes. If remove_zeros is True will return the auxillary matrix
- edge_diameter(s=1)[source]
Returns the length of the longest shortest s-walk between edges in hypergraph
- Parameters:
s (int, optional, default 1) –
- Returns:
edge_diameter
- Return type:
int
- Raises:
HyperNetXError – If hypergraph is not s-edge-connected
Notes
Two edges are s-adjacent if they share s nodes. Two nodes e_start and e_end are s-walk connected if there is a sequence of edges e_start, e_1, e_2, … e_n-1, e_end such that consecutive edges are s-adjacent. If the graph is not connected, an error will be raised.
- edge_diameters(s=1)[source]
Returns the edge diameters of the s_edge_connected component subgraphs in hypergraph.
- Parameters:
s (int, optional, default 1) –
- Returns:
maximum diameter (int)
list of diameters (list) – List of edge_diameters for s-edge component subgraphs in hypergraph
list of component (list) – List of the edge uids in the s-edge component subgraphs.
- edge_distance(source, target, s=1)[source]
XX TODO: still need to return path and translate into user defined nodes and edges Returns the shortest s-walk distance between two edges in the hypergraph.
- Parameters:
source (edge.uid or edge) – an edge in the hypergraph
target (edge.uid or edge) – an edge in the hypergraph
s (positive integer) – the number of intersections between pairwise consecutive edges
TODO (add edge weights) –
weight (None or string, optional, default = None) – if None then all edges have weight 1. If string then edge attribute string is used if available.
- Returns:
s- walk distance – A shortest s-walk is computed as a sequence of edges, the s-walk distance is the number of edges in the sequence minus 1. If no such path exists returns np.inf.
- Return type:
the shortest s-walk edge distance
See also
Notes
The s-distance is the shortest s-walk length between the edges. An s-walk between edges is a sequence of edges such that consecutive pairwise edges intersect in at least s nodes. The length of the shortest s-walk is 1 less than the number of edges in the path sequence.
Uses the networkx shortest_path_length method on the graph generated by the s-edge_adjacency matrix.
- edge_neighbors(edge, s=1)[source]
The edges in hypergraph which share s nodes(s) with edge.
- Parameters:
edge (hashable or Entity) – uid for a edge in hypergraph or the edge Entity
s (int, list, optional, default = 1) – Minimum number of nodes shared by neighbors edge node.
- Returns:
List of edge neighbors
- Return type:
list
- property edge_props
Dataframe of edge properties indexed on edge ids
- Return type:
pd.DataFrame
- classmethod from_bipartite(B, set_names=('edges', 'nodes'), name=None, **kwargs)[source]
Static method creates a Hypergraph from a bipartite graph.
- Parameters:
B (nx.Graph()) – A networkx bipartite graph. Each node in the graph has a property ‘bipartite’ taking the value of 0 or 1 indicating a 2-coloring of the graph.
set_names (iterable of length 2, optional, default = ['edges','nodes']) – Category names assigned to the graph nodes associated to each bipartite set
name (hashable, optional) –
- Return type:
Notes
A partition for the nodes in a bipartite graph generates a hypergraph.
>>> import networkx as nx >>> B = nx.Graph() >>> B.add_nodes_from([1, 2, 3, 4], bipartite=0) >>> B.add_nodes_from(['a', 'b', 'c'], bipartite=1) >>> B.add_edges_from([(1, 'a'), (1, 'b'), (2, 'b'), (2, 'c'), / (3, 'c'), (4, 'a')]) >>> H = Hypergraph.from_bipartite(B) >>> H.nodes, H.edges # output: (EntitySet(_:Nodes,[1, 2, 3, 4],{}), / # EntitySet(_:Edges,['b', 'c', 'a'],{}))
- classmethod from_incidence_dataframe(df, columns=None, rows=None, edge_col: str = 'edges', node_col: str = 'nodes', name=None, fillna=0, transpose=False, transforms=[], key=None, return_only_dataframe=False, **kwargs)[source]
Create a hypergraph from a Pandas Dataframe object, which has values equal to the incidence matrix of a hypergraph. Its index will identify the nodes and its columns will identify its edges.
- Parameters:
df (Pandas.Dataframe) – a real valued dataframe with a single index
columns ((optional) list, default = None) – restricts df to the columns with headers in this list.
rows ((optional) list, default = None) – restricts df to the rows indexed by the elements in this list.
name ((optional) string, default = None) –
fillna (float, default = 0) – a real value to place in empty cell, all-zero columns will not generate an edge.
transpose ((optional) bool, default = False) – option to transpose the dataframe, in this case df.Index will identify the edges and df.columns will identify the nodes, transpose is applied before transforms and key
transforms ((optional) list, default = []) – optional list of transformations to apply to each column, of the dataframe using pd.DataFrame.apply(). Transformations are applied in the order they are given (ex. abs). To apply transforms to rows or for additional functionality, consider transforming df using pandas.DataFrame methods prior to generating the hypergraph.
key ((optional) function, default = None) – boolean function to be applied to dataframe. will be applied to entire dataframe.
return_only_dataframe ((optional) bool, default = False) – to use the incidence_dataframe with cell_properties or properties, set this to true and use it as the setsystem in the Hypergraph constructor.
See also
- Return type:
- classmethod from_incidence_matrix(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]
Same as from_numpy_array.
- classmethod from_numpy_array(M, node_names=None, edge_names=None, node_label='nodes', edge_label='edges', name=None, key=None, **kwargs)[source]
Create a hypergraph from a real valued matrix represented as a 2 dimensionsl numpy array. The matrix is converted to a matrix of 0’s and 1’s so that any truthy cells are converted to 1’s and all others to 0’s.
- Parameters:
M (real valued array-like object, 2 dimensions) – representing a real valued matrix with rows corresponding to nodes and columns to edges
node_names (object, array-like, default=None) – List of node names must be the same length as M.shape[0]. If None then the node names correspond to row indices with ‘v’ prepended.
edge_names (object, array-like, default=None) – List of edge names must have the same length as M.shape[1]. If None then the edge names correspond to column indices with ‘e’ prepended.
name (hashable) –
key ((optional) function) – boolean function to be evaluated on each cell of the array, must be applicable to numpy.array
- Return type:
Note
The constructor does not generate empty edges. All zero columns in M are removed and the names corresponding to these edges are discarded.
- get_cell_properties(edge: str, node: str, prop_name: str | None = None) Any | dict[str, Any] [source]
Get cell properties on a specified edge and node
- Parameters:
edge (str) – edgeid
node (str) – nodeid
prop_name (str, optional) – name of a cell property; if None, all cell properties will be returned
- Returns:
cell property value if prop_name is provided, otherwise
dict
of all cell properties and values- Return type:
int or str or dict of {str: any}
- get_linegraph(s=1, edges=True)[source]
Creates an ::term::s-linegraph for the Hypergraph. If edges=True (default)then the edges will be the vertices of the line graph. Two vertices are connected by an s-line-graph edge if the corresponding hypergraph edges intersect in at least s hypergraph nodes. If edges=False, the hypergraph nodes will be the vertices of the line graph. Two vertices are connected if the nodes they correspond to share at least s incident hyper edges.
- Parameters:
s (int) – The width of the connections.
edges (bool, optional, default = True) – Determine if edges or nodes will be the vertices in the linegraph.
- Returns:
A NetworkX graph.
- Return type:
nx.Graph
- get_properties(id, level=None, prop_name=None)[source]
Returns an object’s specific property or all properties
- Parameters:
id (hashable) – edge or node id
level (int | None , optional, default = None) – if separate edge and node properties then enter 0 for edges and 1 for nodes.
prop_name (str | None, optional, default = None) – if None then all properties associated with the object will be returned.
- Returns:
single property or dictionary of properties
- Return type:
str or dict
- incidence_dataframe(sort_rows=False, sort_columns=False, cell_weights=True)[source]
Returns a pandas dataframe for hypergraph indexed by the nodes and with column headers given by the edge names.
- Parameters:
sort_rows (bool, optional, default =True) – sort rows based on hashable node names
sort_columns (bool, optional, default =True) – sort columns based on hashable edge names
cell_weights (bool, optional, default =True) –
- property incidence_dict
Dictionary keyed by edge uids with values the uids of nodes in each edge
- Return type:
dict
- incidence_matrix(weights=False, index=False)[source]
An incidence matrix for the hypergraph indexed by nodes x edges.
- Parameters:
weights (bool, default =False) – If False all nonzero entries are 1. If True and self.static all nonzero entries are filled by self.edges.cell_weights dictionary values.
index (boolean, optional, default = False) – If True return will include a dictionary of node uid : row number and edge uid : column number
- Returns:
incidence_matrix (scipy.sparse.csr.csr_matrix or np.ndarray)
row_index (list) – index of node ids for rows
col_index (list) – index of edge ids for columns
- is_connected(s=1, edges=False)[source]
Determines if hypergraph is s-connected.
- Parameters:
s (int, optional, default 1) –
edges (boolean, optional, default = False) – If True, will determine if s-edge-connected. For s=1 s-edge-connected is the same as s-connected.
- Returns:
is_connected
- Return type:
boolean
Notes
A hypergraph is s node connected if for any two nodes v0,vn there exists a sequence of nodes v0,v1,v2,…,v(n-1),vn such that every consecutive pair of nodes v(i),v(i+1) share at least s edges.
A hypergraph is s edge connected if for any two edges e0,en there exists a sequence of edges e0,e1,e2,…,e(n-1),en such that every consecutive pair of edges e(i),e(i+1) share at least s nodes.
- neighbors(node, s=1)[source]
The nodes in hypergraph which share s edge(s) with node.
- Parameters:
node (hashable or Entity) – uid for a node in hypergraph or the node Entity
s (int, list, optional, default = 1) – Minimum number of edges shared by neighbors with node.
- Returns:
neighbors – s-neighbors share at least s edges in the hypergraph
- Return type:
list
- node_diameters(s=1)[source]
Returns the node diameters of the connected components in hypergraph.
- Parameters:
and (list of the diameters of the s-components) –
nodes (list of the s-component) –
- property node_props
Dataframe of node properties indexed on node ids
- Return type:
pd.DataFrame
- number_of_edges(edgeset=None)[source]
The number of edges in edgeset belonging to hypergraph.
- Parameters:
edgeset (an iterable of Entities, optional, default = None) – If None, then return the number of edges in hypergraph.
- Returns:
number_of_edges
- Return type:
int
- number_of_nodes(nodeset=None)[source]
The number of nodes in nodeset belonging to hypergraph.
- Parameters:
nodeset (an interable of Entities, optional, default = None) – If None, then return the number of nodes in hypergraph.
- Returns:
number_of_nodes
- Return type:
int
- property properties
Returns dataframe of edge and node properties.
- Return type:
pd.DataFrame
- remove(keys, level=None, name=None)[source]
Creates a new hypergraph with nodes and/or edges indexed by keys removed. More efficient for creating a restricted hypergraph if the restricted set is greater than what is being removed.
- Parameters:
keys (list | tuple | set | Hashable) – node and/or edge id(s) to restrict to
level (None, optional) – Enter 0 to remove edges with ids in keys. Enter 1 to remove nodes with ids in keys. If None then all objects in nodes and edges with the id will be removed.
name (str, optional) – Name of new hypergraph
- Return type:
hnx.Hypergraph
- remove_singletons(name=None)[source]
Constructs clone of hypergraph with singleton edges removed.
- Returns:
new hypergraph
- Return type:
- restrict_to_edges(edges, name=None)[source]
New hypergraph gotten by restricting to edges
- Parameters:
edges (Iterable) – edgeids to restrict to
- Return type:
hnx.Hypergraph
- restrict_to_nodes(nodes, name=None)[source]
New hypergraph gotten by restricting to nodes
- Parameters:
nodes (Iterable) – nodeids to restrict to
- Return type:
hnx. Hypergraph
- s_component_subgraphs(s=1, edges=True, return_singletons=False, name=None)[source]
Returns a generator for the induced subgraphs of s_connected components. Removes singletons unless return_singletons is set to True. Computed using s-linegraph generated either by the hypergraph (edges=True) or its dual (edges = False)
- Parameters:
s (int, optional, default 1) –
edges (boolean, optional, edges=False) – Determines if edge or node components are desired. Returns subgraphs equal to the hypergraph restricted to each set of nodes(edges) in the s-connected components or s-edge-connected components
return_singletons (bool, optional) –
- Yields:
s_component_subgraphs (iterator) – Iterator returns subgraphs generated by the edges (or nodes) in the s-edge(node) components of hypergraph.
- s_components(s=1, edges=True, return_singletons=True)[source]
Same as s_connected_components
See also
- s_connected_components(s=1, edges=True, return_singletons=False)[source]
Returns a generator for the s-edge-connected components or the s-node-connected components of the hypergraph.
- Parameters:
s (int, optional, default 1) –
edges (boolean, optional, default = True) – If True will return edge components, if False will return node components
return_singletons (bool, optional, default = False) –
Notes
If edges=True, this method returns the s-edge-connected components as lists of lists of edge uids. An s-edge-component has the property that for any two edges e1 and e2 there is a sequence of edges starting with e1 and ending with e2 such that pairwise adjacent edges in the sequence intersect in at least s nodes. If s=1 these are the path components of the hypergraph.
If edges=False this method returns s-node-connected components. A list of sets of uids of the nodes which are s-walk connected. Two nodes v1 and v2 are s-walk-connected if there is a sequence of nodes starting with v1 and ending with v2 such that pairwise adjacent nodes in the sequence share s edges. If s=1 these are the path components of the hypergraph.
Example
>>> S = {'A':{1,2,3},'B':{2,3,4},'C':{5,6},'D':{6}} >>> H = Hypergraph(S)
>>> list(H.s_components(edges=True)) [{'C', 'D'}, {'A', 'B'}] >>> list(H.s_components(edges=False)) [{1, 2, 3, 4}, {5, 6}]
- Yields:
s_connected_components (iterator) – Iterator returns sets of uids of the edges (or nodes) in the s-edge(node) components of hypergraph.
- set_state(**kwargs)[source]
Allow state_dict updates from outside of class. Use with caution.
- Parameters:
**kwargs – key=value pairs to save in state dictionary
- property shape
(number of nodes, number of edges)
- Return type:
tuple
- singletons()[source]
Returns a list of singleton edges. A singleton edge is an edge of size 1 with a node of degree 1.
- Returns:
singles – A list of edge uids.
- Return type:
list
- size(edge, nodeset=None)[source]
The number of nodes in nodeset that belong to edge. If nodeset is None then returns the size of edge
- Parameters:
edge (hashable) – The uid of an edge in the hypergraph
- Returns:
size
- Return type:
int
- toplexes(name=None)[source]
Returns a simple hypergraph corresponding to self.
Warning
Collapsing is no longer supported inside the toplexes method. Instead generate a new collapsed hypergraph and compute the toplexes of the new hypergraph.
- Parameters:
name (str, optional, default = None) –