The IMSL_EXACT_NETWORK function computes Fisher exact probabilities and a hybrid approximation of the Fisher exact method for a two-way contingency table using the network algorithm.

The IMSL_EXACT_NETWORK function computes Fisher exact probabilities, or a hybrid algorithm approximation to Fisher exact probabilities, for a N_rows by N_columns contingency table with fixed row and column marginals. A marginal is the number of counts in a row or column.

Let fij denote the count in row i and column j of a table, and let fi and f•j denote the row and column marginals. Under the hypothesis of independence, the (conditional) probability of the fixed marginals of the observed table is given by: Where:

• f•• is the total number of counts in the table
• Pf corresponds to the output keyword PROB_TABLE.

A more extreme table X is defined in the probabilistic sense as more extreme than the observed table if the conditional probability computed for table X (for the same marginal sums) is less than the conditional probability computed for the observed table. Note that this definition can be considered two-sided in the cell counts.

## Example

This example demonstrates various methods of computing chi-squared p-values with respect to accuracy. As seen in the output of this example, the Fisher exact probability and the usual asymptotic chi-squared probability (generated using IMSL_CONTINGENCY) can be different.

`.RUN`
`PRO print_results, p, p2, p3, p4`
`PRINT, 'Asymptotic Chi-Squared p-value'`
`PRINT, 'p-value =', p`
`PRINT, 'Network Algorithm with Approximation'`
`PRINT, 'p-value =', p2`
`PRINT, 'Network Algorithm without Approximation'`
`PRINT, 'p-value =', p3`
`PRINT, 'Total Enumeration Method'`
`PRINT, 'p-value =', p4`
`END`
`table = TRANSPOSE([[20, 20, 0, 0, 0], [10, 10, 2, 2, 1], \$ `
`  [20, 20, 0, 0, 0]])`
`p	= IMSL_CONTINGENCY(table)`
`p2 = IMSL_EXACT_NETWORK(table)`
`p3 = IMSL_EXACT_NETWORK(table, /NO_APPROX)`
`p4 = IMSL_EXACT_ENUM(table)`
`print_results, p, p2, p3, p4`

IDL prints:

`Asymptotic Chi-Squared p-value `
`p-value =	0.0322604`
`Network Algorithm with Approximation`
`p-value =	0.0601165`
`Network Algorithm without Approximation `
`p-value =	0.0598085`
`Total Enumeration Method `
`p-value =	0.0597294`

## Errors

### Warning Errors

STAT_HASH_TABLE_ERROR_2: The value "ldkey" = # is too small. "ldkey" is calculated as Wk_Params(0)*pow(10, N_Attempts−1) ending this execution attempt.

STAT_HASH_TABLE_ERROR_3: The value "ldstp" = # is too small. "ldstp" is calculated as Wk_Params(1)*pow(10, N_Attempts−1) ending this execution attempt.

### Fatal Errors

STAT_HASH_TABLE_ERROR_1: The hash table key cannot be computed because the largest key is larger than the largest representable integer. The algorithm cannot proceed.

## Syntax

Result = IMSL_EXACT_NETWORK(Table [, APPROX_PARAMS=array] [, /DOUBLE] [, /NO_APPROX] [, P_VALUE=variable] [, PROB_TABLE=variable] [, WK_PARAMS=array])

## Return Value

The p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is taken in the Neyman-Pearson sense. The p-value is "two-sided".

## Arguments

### Table

Two-dimensional array containing the observed counts in the contingency table.

## Keywords

### APPROX_PARAMS (optional)

One-dimensional array of size 3. APPROX_PARAMS(0) is the expected value used in the hybrid approximation to Fisher’s exact test algorithm for deciding when to use asymptotic probabilities when computing path lengths. APPROX_PARAMS(1) is the percentage of remaining cells that must have estimated expected values greater than APPROX_PARAMS(0) before asymptotic probabilities can be used in computing path lengths. APPROX_PARAMS(2) is the minimum cell estimated value allowed for asymptotic chi-squared probabilities to be used.

Asymptotic probabilities are used in computing path lengths whenever APPROX_PARAMS(1) or more of the cells in the table have estimated expected values of APPROX_PARAMS(0) or more, with no cell having expected value less than APPROX_PARAMS(2). See the description at the beginning of this topic for details.

Defaults:

APPROX_PARAMS(0) = 5.0

APPROX_PARAMS(1) = 80.0

APPROX_PARAMS(2) = 1.0

These defaults correspond to the "Cochran" condition.

### DOUBLE (optional)

If present and nonzero, double precision is used.

### NO_APPROX (optional)

If present and nonzero, the Fisher exact test is used and APPROX_PARAMS is ignored.

### P_VALUE (optional)

Named variable containing the p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is in the Neyman-Pearson sense. The P_Value is "two-sided." The p-value is also returned in functional form.

A table is more extreme if its probability (for fixed marginals) is less than or equal to PROB_TABLE.

### PROB_TABLE (optional)

Named variable containing the probability of the observed table occurring, given that the null hypothesis of independent rows and columns is true.

### WK_PARAMS (optional)

One-dimensional array of size 3. The network algorithm requires a large amount of workspace. Some of the workspace requirements are well-defined, while most of the workspace requirements can only be estimated. The estimate is based primarily on table size.

The IMSL_EXACT_ENUM function allocates a default amount of workspace suitable for small problems. If the algorithm determines that this initial allocation of workspace is inadequate, the memory is freed, a larger amount of memory allocated (twice as much as the previous allocation), and the network algorithm is re-started. The algorithm allows for up to WK_PARAMS(2) attempts to complete the algorithm.

Because each attempt requires computer time, it is suggested that WK_PARAMS(0) and WK_PARAMS(1) be set to some large numbers (like 1,000 and 30,000) if the problem to be solved is large. It is suggested that WK_PARAMS(1) be 30 times larger than WK_PARAMS(0). Although IMSL_EXACT_ENUM will eventually work its way up to a large enough memory allocation, it is quicker to allocate enough memory initially.

The known (well-defined) workspace requirements are as follows:

• f•• = ΣΣfij is equal to the sum of all cell frequencies in the observed table
• nt = f•• + 1
• mx = max(N_rows, N_columns)
• mn = min(N_rows, N_columns)
• t1 = max(800 + 7mx, (5 +2mx) (N_rows + N_columns + 1) )
• t2 = max(400 + mx, + 1, N_rows + N_columns + 1)

IN IDL terms:

• N_rows = N_ELEMENTS(Table(*,0))
• N_columns = N_ELEMENTS(Table(0,*)).

The amount of integer workspace allocated is 3mx + 2mn + t1.

The amount of real workspace allocated is nt + t2.

The remainder of workspace that is required must be estimated and allocated based on WK_PARAMS(0) and WK_PARAMS(1).

• The amount of integer workspace allocated is 6n * (WK_PARAMS(0) + WK_PARAMS(1)).
• The amount of real workspace allocated is n, where n is (6*WK_PARAMS(0) + 2* Wk_Params(1)).
• Variable n is the index for the attempt, 1 < n ≤ Wk_Params(2).

Defaults:

WK_PARAMS(0) = 100

WK_PARAMS(1) = 3000

WK_PARAMS(2) = 10

## Version History

 6.4 Introduced