API Documentation

Modules and Functions

interflow.analyze.group_results(df: pandas.core.frame.DataFrame, output_level=1)

Groups package run output to level of granularity specified. Returns a dataframe of values with source to target flows by region.

Parameters
  • df (DataFrame) – Dataframe of package run output values.

  • output_level (int) – Level of granularity output will be grouped to. Must be an integer between one and five, inclusive. Default value is set to level 1 granularity.

Returns

DataFrame of values organized as source to target flow values summed to level of granularity specified.

interflow.calc_flow.calculate(data: pandas.core.frame.DataFrame, level=5, region_name=None, remove_loops=True, output_file_path=None) pandas.core.frame.DataFrame

Loops through input data for each region provided or specified and (1) collects flows for input data, (2) calculates any cross unit flows based on input flow intensity values, (3) builds source flows for calculated intensities based on source fraction assumptions, and (4) builds discharge flows for calculated intensities and input data based on discharge fractions. Finally, outputs are aggregated to specified level of granularity. This function also removes all self-provided (i.e., looped) flows if remove_loops parameter is set to True.

Parameters
  • data (DataFrame) – Pandas DataFrame of input flow values, intensities, and fractions

  • level (int) – Desired level of granularity of output data. Must be an integer between 1 and 5, inclusive.

  • region_name (str) – Name of region to conduct analysis of. If none is specified, calculations are run for all regions included in the input data.

  • remove_loops (bool) – Boolean indicating whether looped values (i.e., nodes whose output is its own input value) should be removed from output dataset. Default is True.

  • output_file_path (str) – Optional parameter to give a file path, inclusive of file name, to save dataframe output as a csv. Default is set to None (no output saved)

Returns

DataFrame of flow run output at specified level of granularity for specified region(s)

interflow.construct.construct_nested_dictionary(df: pandas.core.frame.DataFrame) dict

Takes in a 16 column DataFrame of values and returns a nested dictionary

Parameters

df (DataFrame) – dataframe of values to convert to nested dictionary

Returns

Nested dictionary of dataframe values

interflow.deconstruct.deconstruct_dictionary(input_dict: dict) pandas.core.frame.DataFrame

Takes in a nested dictionary of run values and returns a dataframe with flow information as columns.

Parameters

input_dict (dict) – nested dictionary of values to unpack into a dataframe

Returns

Pandas Dataframe

interflow.reader.get_coal_mine_location_data()

Read in data with information on the county location of individual coal mines.

Returns

dataframe of coal mine location values

interflow.reader.get_coal_production_data()

Read in 2015 data from US EIA on coal production and mine type at the coal-mine level.

Returns

dataframe of coal mine production values

interflow.reader.get_corn_irrigation_data()

Read in data from USDA Farm and Ranch Irrigation Survey on total irrigation to all crops and corn production.

Returns

dataframe of irrigation values

interflow.reader.get_corn_production_data()

Read in data from USDA on the total corn production for 2015 at the county level.

Returns

dataframe of county-level corn production values

interflow.reader.get_county_fips_data()

Read in data to map the 2015 county alphanumeric names to county FIPS codes.

Returns

dataframe of county names and FIPS codes

interflow.reader.get_county_petroleum_natgas_production_data()

Read in data on county level petroleum and natural gas production data.

Returns

dataframe of natural gas and petroleum production values

interflow.reader.get_electricity_cooling_flow_data()

Read in USGS 2015 data on thermoelectric cooling withdrawals and water consumption for individual power plants.

Returns

dataframe of thermoelectric cooling values

interflow.reader.get_electricity_demand_data()

Read in data from US EIA for 2015 on the total electricity demand in each state by the residential, commercial, industrial, and transportation sector.

Returns

dataframe of electricity demand values

interflow.reader.get_electricity_generation_data()

Read in electricity generation and fuel use by individual power plants in the US for 2015.

Returns

dataframe of electricity generation and fuel use values

interflow.reader.get_electricity_water_intensity_data()

Read in water intensity data for various types of power plant technologies.

Returns

dataframe of water intensity values

interflow.reader.get_ethanol_plant_location_data()

Read in data on ethanol plant locations for 2015.

Returns

dataframe of ethanol plant location values

interflow.reader.get_fuel_demand_data()

Read in data from US EIA for 2015 on the total fuel demand in each state by the residential, commercial, industrial, and transportation sector.

Returns

dataframe of fuel demand values

interflow.reader.get_fuel_renaming_data()

Read in data to rename fuel demand variables.

Returns

dataframe of new variable names to map to old variable names

interflow.reader.get_irrigation_pumping_data()

Read in data from USDA Farm and Ranch Irrigation Survey 2013 with information on average_well_depth_ft, average operating pressure (psi),average pumping capacity (gpm), and the amount of irrigation pumping using electricity, natural gas, propane, and diesel at the state-level.

Returns

dataframe of irrigation pumping values

interflow.reader.get_petroleum_natgas_rename_data()

Read in data to rename original petroleum and natural gas values to their long-form descriptive name.

Returns

dataframe of new variable names to map to old variable names

interflow.reader.get_power_plant_location_data()

Read in data that includes information on the location (county, state) of individual power plants (by plant code) in the US for 2015

Returns

dataframe of power plants and their locations

interflow.reader.get_pumping_intensity_rename_data()

Read in data to rename pumping intensity variables.

Returns

dataframe of rename values to map to old names

interflow.reader.get_state_fips_crosswalk_data()

Read in data with state names, state abbreviations, and state-level FIPS codes.

Returns

dataframe of state identification values

interflow.reader.get_state_fuel_production_data()

Read in data from US EIA for 2015 with state-level fuel production data including biomass, natural gas, and petroleum.

Returns

dataframe of fuel production values

interflow.reader.get_state_petroleum_natgas_water_data()

Read in state-level data on the water to oil and water to natural gas ratios as well as the percent of water from each that is injected, consumed, or discharged to the surface.

Returns

dataframe of natural gas and petroleum values

interflow.reader.get_state_water_to_conventional_oil_data()

Read in data on the water to oil ratio by PADD region for conventional oil production.

Returns

dataframe of water intensity values

interflow.reader.get_state_water_to_unconventional_production_data()

Read in state-level data on water use in the production of unconventional natural gas and petroleum.

Returns

dataframe of water use values

interflow.reader.get_tx_ibt_data()

Read in data on Texas interbasin water transfers for 2015.

Returns

dataframe of interbasin transfer values

interflow.reader.get_wastewater_discharge_data()

Read in data of wastewater facility discharge data.

Returns

dataframe of wastewater discharge values

interflow.reader.get_wastewater_flow_data()

Read in data of wastewater facility water flow data.

Returns

dataframe of wastewater flow values

interflow.reader.get_wastewater_location_data()

Read in data of wastewater facility location data.

Returns

dataframe of wastewater location values

interflow.reader.get_wastewater_type_data()

Read in data of wastewater facility treatment type data.

Returns

dataframe of wastewater treatment values

interflow.reader.get_water_consumption_rename_data()

Read in 1995 water use rename key data.

Returns

dataframe of variable names to map to original names

interflow.reader.get_water_use_1995_data()

Read in 1995 USGS water use data

Returns

dataframe of water use values

interflow.reader.get_water_use_2015_data()

Read in 2015 USGS water use data

Returns

dataframe of 2015 water use values

interflow.reader.get_water_use_rename_data()

Read in variable renaming key for USGS 2015 water use data

Returns

dataframe of variable names to map to original names

interflow.reader.get_west_ibt_data()

Read in data on western interbasin water transfers.

Returns

dataframe of interbasin transfer values for western states

interflow.reader.load_sample_data_output() pandas.core.frame.DataFrame

Read in a copy of the run output for all US counties.

Returns

dataframe of county output values

interflow.reader.load_sample_geojson_data()

Read in GeoJSON file with county-level information for mapping all US counties.

Returns

dataframe of county-level corn production values

interflow.reader.read_sample_data() pandas.core.frame.DataFrame

Read in complete sample input csv data as a Pandas DataFrame.

Returns

DataFrame of complete sample data values for US Counties

interflow.sample_data.calc_discharge_fractions()

Takes water flows to residential, commercial, industrial, mining, and non-irrigation agriculture sectors and calculates their discharge fractions to the surface and ocean. All water that is not consumed by these sectors is assumed to be discharged to either the surface or ocean.

Returns

DataFrame of discharge fractions

interflow.sample_data.calc_hydro_water_intensity(intensity_cap=True, intensity_cap_amt=6000) pandas.core.frame.DataFrame

calculates the water use (mgd) required per bbtu of hydroelectric generation. Daily water use (mgd) is combined with daily generation from hydropower for each region from 1995 USGS data. Discharge and source fraction variables are also created. Only counties with hydroelectric generation in 2015 are assigned intensity estimates.

Parameters
  • intensity_cap (bool) – If set to true, applies a cap to the water intensity value in any county.

  • intensity_cap_amt – Sets the amount of the water intensity cap in mgd per bbtu

Returns

DataFrame of water intensity of hydroelectric generation by county

interflow.sample_data.calc_irrigation_consumption() pandas.core.frame.DataFrame

Takes 2015 USGS water flow data and calculates consumption fractions for crop irrigation and golf irrigation based on consumptive use in those sub-sectors. Additionally, water withdrawal values for crop irrigation are filled in with general irrigation values for counties with missing crop irrigation data.

Returns

Dataframe of 2015 water flow values and irrigation sub-sector consumption fractions

interflow.sample_data.calc_irrigation_conveyance_loss_fraction(loss_cap=True, loss_cap_amt=0.9) pandas.core.frame.DataFrame

This function calculates the fraction of water lost during conveyance for irrigation (Crop and golf) for surface water, groundwater, and reused wastewater. The fraction is calculated as water lost in conveyance of irrigation water divided by total water withdrawn for irrigation. States with no conveyance losses were replaced with the country average. Counties with missing values were replaced with the state average.

Parameters
  • loss_cap (bool) – If True, a cap is placed on the conveyance loss fraction

  • loss_cap_amt (float) – The amount at which irrigation losses are capped and values beyond are replaced by the specified cap amount. The default value is .90.

Returns

DataFrame of conveyance loss fractions by row

interflow.sample_data.calc_irrigation_discharge_flows()

Recalculates the consumption fractions for crop and golf irrigation given the calculated conveyance loss fractions. Returns irrigation discharges to consumption, conveyance losses, and surface discharge. The fraction sent to consumption is assumed to be the prior consumption fraction multiplied by any remaining water after conveyance losses. Surface discharge fraction is calculated as any remaining percentage after consumption.

Returns

Dataframe of recalculated irrigation consumption fractions, conveyance losses, and surface discharge

interflow.sample_data.calc_population_county_weight(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

calculates the percentage of state total population by county and merges to provided dataframe by variable ‘State’. Used in splitting up state-level estimates to the county level.

Parameters

df (Pandas DataFrame) – dataframe of state-level values to combine with county population weights. Should only include State column as the regional identifier and include state-level values.

Returns

DataFrame of water consumption fractions for various sectors by county

interflow.sample_data.calc_pws_commercial_industrial_flows() pandas.core.frame.DataFrame

calculates public water deliveries to the commercial and industrial sectors based on ratios determined from 1995 USGS water dataset

Returns

DataFrame of public water supply demand by commercial and industrial sector

interflow.sample_data.calc_sc_ww_values()

Calculates estimates for wastewater treatment demand data for the state of South Carolina. South Carolina is the only state that does not have wastewater treatment values in the wastewater treatment facility dataset. Total municipal wastewater treatment flows are estimated as the amount of public water supply deliveries to the residential, commercial, and industrial sectors that is not consumed. For the state of South Carolina, therefore, wastewater treatment demand is expected to be equal to wastewater treatment supply with no exports or imports.

Returns

Dataframe of wastewater treatment flows from municipal sources for the state of South Carolina

interflow.sample_data.combine_ww_data() pandas.core.frame.DataFrame

Combines full wastewater demand dataset with estimates calculated for the state of South Carolina to get a complete wastewater treatment dataset by county.

Returns

Dataframe of wastewater treatment values by county

interflow.sample_data.compile_sample_data()

Combines output data from all functions into a single dataset, structures data for input into the flow Python package.

Returns

Dataframe of fully structured data

interflow.sample_data.convert_kwh_bbtu(value: float) float

converts energy in kWh to energy in billion btu. :param value: value in kilowatt-hours of energy :type value: float

Returns

value in bbtu

interflow.sample_data.convert_mwh_bbtu(value: float) float

converts energy in MWh to energy in billion btu. :param value: value in megawatt-hours of energy :type value: float

Returns

value in bbtu

interflow.sample_data.prep_consumption_fraction() pandas.core.frame.DataFrame

prepping water consumption fractions for sectors not included in the 2015 USGS water datset by using the consumptive use estimates in the 1995 USGS dataset. For Residential and Commercial sectors it is assumed that all water consumed is fresh water. For the Industrial and Mining sectors, separate fresh and saline consumption fractions are calculated.

Returns

DataFrame of consumption fractions for residential, commercial, industrial, mining, livestock, and aquaculture sectors.

interflow.sample_data.prep_corn_crop_irr_flows()
prepares values for water for corn growth for ethanol including consumption fractions, surface discharge fractions,

and renames fresh surface water withdrawal, fresh groundwater withdrawal values to proper format.

Returns

DataFrame of corn irrigation consumption fraction and discharge fraction values

interflow.sample_data.prep_county_coal_data() pandas.core.frame.DataFrame

prepares a dataframe of water type and water source fractions to coal mining by county. These are assumed to be equal to total mining water withdrawal sources and discharge fractions in the 2015 USGS water data. Also develops full variable names for coal production and coal water intensity variables.

Returns

DataFrame of water types and water source fractions to coal production

interflow.sample_data.prep_county_coal_production_data() pandas.core.frame.DataFrame

prepares a dataframe of coal production by county from surface and underground mines in bbtu. Also creates a surface and underground water intensity per bbtu variable.

Returns

DataFrame of coal production values in bbtu by county

interflow.sample_data.prep_county_ethanol_production_data() pandas.core.frame.DataFrame

Takes 2015 eia data on ethanol plant capacity with locational data and combines with state level biomass (ethanol) production data to split out state total by county. Returns a dataframe of ethanol production (bbtu) by county FIPS for each county in the US for 2015.

Returns

DataFrame of county-level ethanol production values

interflow.sample_data.prep_county_identifier() pandas.core.frame.DataFrame

preps a dataset of FIPS codes and associated county name crosswalk so that datasets with just county names can be mapped to appropriate FIPS codes.

return

DataFrame of FIPS code and county name identifier crosswalk

interflow.sample_data.prep_county_natgas_production_data() pandas.core.frame.DataFrame

prepares a dataframe of natural gas production by county for the year 2015. The dataframe uses 2011 natural gas production (million cubic ft) by county in the US to determine which counties in a given state contribute the most to the state total. These percent of state total values from 2011 are mapped to 2015 state total natural gas production to get 2015 values on a county level. For some states, no county-level estimates exist in the 2011 estimates. County-level values for these states are individually provided.

Returns

DataFrame of a natural gas production (bbtu) and water use (mgd) by county

interflow.sample_data.prep_county_petroleum_production_data() pandas.core.frame.DataFrame

prepares a dataframe of oil production by county. The dataframe uses 2011 crude oil production (barrels per year) by county in the US to determine which counties in a given state contribute the most to the state total. These percent of state total values from 2011 are mapped to 2015 state total oil production to get 2015 values on a county level. For states that do not have county values in the 2011 estimate, individually-sourced information is supplemented.

Returns

DataFrame of a petroleum production (bbtu) by county

interflow.sample_data.prep_county_water_corn_biomass_data() pandas.core.frame.DataFrame

Produces a dataframe of water (MGD) for corn irrigation for ethanol by county. Water intensity applied to all crop irrigation is applied to the irrigation used in the production of corn for ethanol.

Returns

DataFrame of a number of water values for 2015 at the county level

interflow.sample_data.prep_electricity_cooling() pandas.core.frame.DataFrame

Maps cooling water data to power plant generation data and fills missing values with established methodology using water withdrawal and consumption intensity estimates.

Returns

Dataframe of cooling water values by plant.

interflow.sample_data.prep_electricity_cooling_flows() pandas.core.frame.DataFrame

Prepares flows from water supply to thermoelectric cooling and water flows from thermoelectric cooling to consumption, surface discharge, and ocean discharge.

Returns

interflow.sample_data.prep_electricity_demand_data() pandas.core.frame.DataFrame

prepping electricity demand data by sector from EIA electricity sales data. Produces a dataframe of demand data by county.

Returns

DataFrame of electricity demand data by county

interflow.sample_data.prep_electricity_fuel() pandas.core.frame.DataFrame

Prepares fuel and generation data by power plant ID from EIA 923 data. Bins generation type, prime mover, and maps power plants to location information.

Returns

Dataframe of power plant fuel and generation data by plant ID.

interflow.sample_data.prep_fuel_demand_data() pandas.core.frame.DataFrame

prepares fuel demand data to the residential, commercial, industrial, and transportation sectors. Returns a dataframe of fuel demand by fuel type and sector in bbtu per day for each county.

Returns

DataFrame of a fuel demand values by sector

interflow.sample_data.prep_generation_fuel_flows() pandas.core.frame.DataFrame

Function prepares data flows from fuel supply to electricity generation, electricity generation supply to electricity generation demand, and electricity generation to rejected energy.

Returns

interflow.sample_data.prep_interbasin_transfer_data() pandas.core.frame.DataFrame

Prepares interbasin water transfer data so that output is a dataframe of energy use (BBTU) and total water transferred for irrigation and public water supply in total.

Returns

DataFrame of interbasin transfer water values for 2015 at the county level

interflow.sample_data.prep_irrigation_pws_ratio() pandas.core.frame.DataFrame

prepping the ratio of water flows to irrigation vs. water flows to public water supply by county. Used to determine the split of electricity in interbasin transfers between the two sectors.

Returns

DataFrame of percentages by county

interflow.sample_data.prep_natgas_water_intensity()

Water withdrawal data is supplied for a select number of states. State totals are split out to counties using the same county percent of total natural gas production as the production calculation. For states with 2015 production but no water withdrawal estimates, the national average water intensity (mg/bbtu) is applied to their natural gas production quantity. It is assumed that 80% of these calculated total water use values come from fresh surface water and 20% from fresh groundwater.

Returns

interflow.sample_data.prep_petroleum_gas_discharge_data() pandas.core.frame.DataFrame

prepares a dataframe of produced water intensities, consumption fractions, and discharge fractions for petroleum and natural gas production. Note that only unconventional petroleum production results in produced water. Consumption and discharge fractions are assumed for all types of petroleum production.

Returns

DataFrame of produced water intensities, consumption fractions, and discharge fractions for unconventional natural gas and petroleum production

interflow.sample_data.prep_petroleum_water_intensity()

Takes county level petroleum-production values and determines the water intensity for the given county for both unconventional and conventional petroleum production.

Returns

Dataframe of county-level petroleum water intensity values

interflow.sample_data.prep_power_plant_location() pandas.core.frame.DataFrame

prepping power plant location information to provide a dataframe of power plant codes and their associated FIPS code. Power plants with unidentified counties are removed from the dataframe. These missing FIPS codes are addressed, if needed, in alternative functions.

Returns

DataFrame of power plant IDs and associated FIPS codes

interflow.sample_data.prep_public_water_supply_fraction() pandas.core.frame.DataFrame
calculates public water supply deliveries for the commercial and industrial sectors individually

as a ratio to the sum of public water supply deliveries to residential end users and thermoelectric cooling. Used in calculation of public water supply demand to commercial and industrial sectors.

Returns

DataFrame of public water supply ratios for commercial and industrial sector.

interflow.sample_data.prep_pumping_energy_fuel_data() pandas.core.frame.DataFrame

prepping pumping fuel source data so that the output is a dataframe showing the percent of energy for crop irrigation, golf irrigation, aquaculture, livestock, and public water supply that comes from each fuel source type (e.g., electricity, natural gas). Also includes discharge fractions for rejected energy and energy services.

Returns

DataFrame fuel source fractions, rejected energy fractions, and energy services fractions

interflow.sample_data.prep_pumping_intensity_data() pandas.core.frame.DataFrame

Prepares irrigation data so that the outcome is a dataframe of groundwater and surface water pumping energy intensities (billion BTU per million gallons) by county. For groundwater pumping intensity, The total differential height is calculated as the sum of the average well depth and the pressurization head. The pressure data is provided in pounds per square inch (psi). This is converted to feet using a conversion of 2.31. This analysis also follows the assumption that average well depth is used instead of depth to water to counteract some of the undocumented friction that would occur in the pumping process. Surface water pumping intensity follows the same methodology as groundwater pumping intensity except the total differential height has a value of zero for well depth.

Returns

DataFrame of irrigation surface and groundwater pumping intensity per county

interflow.sample_data.prep_pws_to_pwd() pandas.core.frame.DataFrame

Calculates public water supply exports, imports, and flows to public water demand based on total public water demand from residential, commercial, and industrial and total public water supply from direct withdrawals and interbasin transfers.

Returns

Dataframe of public water supply flows

interflow.sample_data.prep_pws_treatment_dist_intensity_values()

Prepares energy intensity values for public water supply treatment and distribution

Returns

Dataframe of public water supply intensities

interflow.sample_data.prep_state_fuel_production_data() pandas.core.frame.DataFrame

preps state-level fuel production data for petroleum, biomass, natural gas, and coal. Outputs are used to determine county-level fuel production for each fuel type. Values are annual production.

Returns

DataFrame of fuel production data by fuel type and state

interflow.sample_data.prep_wastewater_data() pandas.core.frame.DataFrame

preps each wastewater treatment facility data file (water flows, facility locations, facility types, and facility discharge data), cleans input, and brings them together to produce a single wastewater treatment datafile by FIPS county code.

Returns

DataFrame of wastewater treatment water flows for each county

interflow.sample_data.prep_water_use_1995(variables=None, all_variables=False) pandas.core.frame.DataFrame
prepping 1995 water use data from USGS by replacing missing values, fixing FIPS codes,

and reducing to needed variables.

Parameters
  • variables (list) – None if no specific variables required in addition to FIPS code. Default is None, otherwise a list of additional variables to include in returned dataframe.

  • all_variables (bool) – Include all available variables in returned dataframe. Default is False.

Returns

DataFrame of water values for 1995 at the county level

interflow.sample_data.prep_water_use_2015(variables=None, all_variables=False) pandas.core.frame.DataFrame

prepares 2015 water use data from USGS. Includes modifications such as replacing non-numeric values, reducing available variables in output dataframe, renaming variables appropriately, and returning a dataframe of specified variables.

Parameters
  • variables (list) – None if no specific variables required in addition to FIPS code, state name, and county name. Default is None, otherwise a list of additional variables to include in returned dataframe.

  • all_variables (bool) – Include all available variables in returned dataframe. Default is False.

Returns

DataFrame of a water withdrawal and consumption values for 2015 at the county level

interflow.sample_data.remove_double_counting_from_mining()

Calculates total water withdrawals in natural gas and petroleum to split them out from all mining water provided in USGS 2015 data. Takes leftover mining water after already subtracting out coal water use.

Returns

DataFrame of recalculated water use in non-energy mining

interflow.sample_data.remove_industrial_water_double_counting()

Removes fresh surface water withdrawals for the production of ethanol in the industrial sector from total fresh surface water withdrawals by the industrial sector from the USGS 2015 dataset to avoid double counting.

Returns

Dataframe of recalculated industrial fresh surface water withdrawal

interflow.sample_data.remove_irrigation_water_double_counting()

Subtracts water use in the irrigation of corn growth for ethanol from the total water use in crop irrigation provided in the 2015 USGS dataset to prevent double counting.

Returns

DataFrame of crop irrigation values for 2015 with ethanol corn irrigatio removed

interflow.sample_data.rename_natgas_petroleum_data()

Takes county level natural gas and petroleum production, water intensity, water source, and water discharge data and renames into required variable name structure. Also adds a flow connection between energy production of natural gas and petroleum to energy demand of natural gas and petroleum.

Returns

interflow.sample_data.rename_water_data_2015(variables=None, all_variables=False) pandas.core.frame.DataFrame

Takes USGS 2015 flow values and calculated consumption fractions and renames them for higher description.

Returns

returns a DataFrame of 2015 water flows and consumption fractions for agriculture

interflow.visualize.plot_map(jsonfile: dict, data: pandas.core.frame.DataFrame, level=1, strip=None, center=None)
Takes flow package output and plots a choropleth map of an individual value. Displaying the first flow value

in the dataset by default and produces a drop-down menu of the remaining flows to select from and update the map. Requires a GeoJSON file containing the geometry information for the region of interest. The feature.id in the file must align with the region data column in the dataframe of input values to display. Flow values can be displayed on the map and represented in the dropdown menu for the indicated level of granularity (level 1 through level 5, inclusive). Additionally, an optional parameter is provided to display additional regional identification information in the hover-template when a region is hovered over. This is provided in the region_col parameter and points to the column in the input data with this information.

Parameters
  • jsonfile (dict) – loaded GeoJSON dictionary containing geometry information for the values to be plotted on the map. The feature.id in the file must align with the region data column in the dataframe of input values to display.

  • data (Dataframe) – dataframe of flow values from source to target by region

  • level (int) – level of granularity to display for values. Level should be an integer between 1 and 5 inclusive. Default is set to level 1 granularity.

  • strip (str) – optional parameter to provide a string that will be removed from the labels in the output. For example, if the input data has a repeated word such as ‘total’ for numerous levels, the word ‘total’ will be stripped. Default is set so that no words are stripped.

  • center (dict) – dictionary of coordinates in the form of {“lat”: 37.0902, “lon”: -95.7129} which centers the displayed map. Default center coordinates are {“lat”: 37.0902, “lon”: -95.7129}.

Returns

choropleth map shaded by value for all regions provided at level specified and for specified units.

interflow.visualize.plot_sankey(data, unit_type1, output_level=1, unit_type2=None, region_name=None, strip=None, remove_sectors=None)

Plots interactive sankey diagram(s) for a given region at a given level of granularity from package output data. Requires that variable naming is consistent with flow package output variable naming. At least one unit type must be specified as a parameter. Output level can be specified to display sankey diagrams at different levels of granularity. Sankey diagram(s) can only display a single region at a time. If no region name is specified, the flow data provided must be for a single region. Contains the option to strip strings from node names to remove replicated placeholder names such as ‘total’. On hover, the flow values are displayed. Note that an ‘m’ following a value indicates that the value shown is a decimal. For example, 80m is equivalent to .80.

Parameters
  • data (DataFrame) – dataframe of flow values from source to target, must be provided at level 5 granularity.

  • unit_type1 (string) – units of the first set of flow values (e.g., mgd)

  • output_level (int) – level of granularity of values returned in the figure.

  • unit_type2 (string) – units of the second set of flow values (e.g., bbtu)

  • region_name (string) – Name of region to display values for if input data includes multiple. If none is specified, data must be for a single region.

  • strip (string) – Optional parameter. Provides a string to remove from variable labels.

  • remove_sectors (list) – Optional parameter to remove all flows into and out of a level 1 sector. Removes values at all levels for specified sector.

Returns

interactive Sankey diagram of flow values

interflow.visualize.plot_sector_bar(data, unit_type, region_name, sector_list, inflow=True, strip=None)

Plots a stacked barchart for a single region of inflows or outflows for selected sectors in selected units. The stacked bars represent the highest level of granularity available for each major sector. For example, if there are values for water flows into the public water supply sector, specifically tied to the water flows for fresh surface water imports, then one of the stacked components in the public water supply bar in the chart will be equal to the value of this specific sub-sector flow.

Parameters
  • data (DataFrame) – dataframe of flow values from source to target

  • unit_type (str) – unit type to be displayed, must be equal to resource unit type in input data

  • region_name (str) – name of region to display values for.

  • sector_list (list) – list of major sectors to include stacked values for as strings. Strings must be provided at level 1 granularity. For example, providing sector_list=[‘Public Water Supply’, ‘Residential’] will show all of the subsector inflows or outflows for those sectors.

  • inflow (bool) – If true, shows inflows into each specified sector. If false, shows outflows. Default is set to True. Note that inflows are reflected in terms of the destination subsector not the source of the inflow. For example, indicating public water supply as a sector and setting inflows to True will show the values attributed to each of the public water supply subsectors, e.g., energy demand for fresh surface water pumping in the public water supply sector. Outflows, on the other hand, reflect the destination of the outflow. For example, if inflow is set to False, we would see which downstream sector the indicated major sector was sending its resources.

  • strip (str) – optional parameter to provide a string that will be removed from the labels in the output. For example, if the input data has a repeated word such as ‘total’ for numerous levels, the word ‘total’ will be stripped.

Returns

stacked barchart for a single region of inflows or outflows for selected sectors in selected units.

Test Validation Suite

interflow contains automated tests for checking correctness. Tests are automatically run using pytest, a unittest framework, with Github Actions anytime a push or a pull request is made. Test files developed for interflow can also be run locally as individual scripts using the unittest module. All test scripts and can be found in the tests folder in the interflow repository. For more information on unittest and instructions on how to run test files through the command line see the unittest documentation

Below are short descriptions of some of the test cases within the test suite. This is not intended to be an exhaustive list of tests included in the package but instead provides a general idea of the types of tests included. To see the test files for the package, visit the test directory linked above.

test_analyze.py - Includes tests for functions of analyze.py. Tests to make sure function returns the expected data structure (e.g., Pandas DataFrame), that grouping results at various levels gives expected DataFrame shape, and that grouping data at various levels through parameter inputs gives the correct sum of data values in the output.

test_calc_flow.py - Includes tests for functions of calc_flow.py. There are various types of tests within test_cal_flow.py including:

  • tests to confirm that a ValueError gets raised if incorrect input data is provided (e.g., invalid output granularity specified, invalid region name specified, incorrect input data shape provided)

  • tests that specifying a desired output level as a parameter gives the expected output granularity

  • tests various combinations of sample data values to check that output flows are calculated as expected

test_construct.py - Includes tests for functions of construct.py including:

  • tests to confirm that a ValueError gets raised if incorrect input data is provided (e.g., invalid input DataFrame shape)

  • tests that the output is in the expected format (e.g., Pandas DataFrame)

  • tests that the output is at the expected level of data granularity.

test_deconstruct.py - Includes tests for functions of construct.py including the following:

  • tests that the output is in the expected data format (e.g., Pandas DataFrame)

  • tests that tabular output DataFrames are the expected shape

  • tests to make sure there are no missing values in output

  • tests to confirm that a ValueError gets raised if incorrect input data is provided (e.g., invalid input data structure)

test_reader.py - Includes tests for functions of reader.py. Functions within reader are used to read in data files for use in other functions. Each of the functions in reader.py, therefore, are tested to make sure they are loading the input file and returning the expected data structure type.

test_sample_data.py - Includes tests for functions of sample_data.py. The functions within sample_data.py collect and organize the US county sample input data provided with the package. The tests of these functions can vary given that each function handles different sample data organization components. Some of the general types of tests included in test_sample_data.py are:

  • tests of conversion functions to confirm they are correctly converting units

  • tests that output DataFrames have the expected shape

  • tests to make sure no data is missing in output DataFrames

  • tests to make sure that removed data items do not appear in output DataFrames

  • tests to make sure that output DataFrames include the expected columns

  • tests to make sure that values in output DataFrames are within the expected range (e.g., fractions fall between 0 and 1, inclusive)

  • tests that when functions are run with default parameters that the output is as expected