SEGREGATION INDEX CALCULATOR (SEGCALC)            


1. INTRODUCTION

Segregation Index Calculator (SEGCALC) was developed by Dr. Stavros
Konstantinidis (dept. of Mathematics and Statistics)and Dr. Ivan
Townshend (dept. of Geography) at the University of Lethbridge (Alberta)
during the months of 
May and July of 1997. The motivation for this project was the need by 
researchers to compute geographical indices for the purpose of evaluating 
the degree of segregation of a city's population groups. 
SEGCALC implements the following 18 indices:

(a) ACE (absolute centralization),    (b) ACL (absolute clustering),
(c) ACO (absolute concentration),     (d) Atkinson's,
(e) Delta,                            (f) Dissimilarity,
(g) DPxx* (distance decay isolation), (h) DPxy* (distance decay interaction),
(i) Entropy,                          (j) Eta squared (correlation ratio),
(k) Gini coefficient,                 (l) PCC (proportion central city),
(m) RCE (relative centralization),    (n) RCL (relative clustering), 
(o) RCO (relative concentration),     (p) SP (spatial proximity),
(q) xP*y (isolation),                 (r) xP*y (interaction).

The formulas used to compute the above indices are taken from Massey
and Denton (1988) with the exception of DPxx* and DPxy* which are
implemented according to the correct formula given in Morgan (1983).
For the indices that use the contiguity between areal units as a
function of the distance between their centroid coordinates, SEGCALC 
assumes that the contiguity between an areal unit and itself is equal 
to 1 (see page 294 of Massey and Denton, 1988).

While testing the effectiveness of SEGCALC, the following problems were
identified (see also Townshend, Konstantinidis and Walker (in progress) for further details):
- It is possible that the ACL index produces negative values, despite
  the claim of Massey and Denton that its values always range between
  0 and 1.
- There are situations where the values of the indices ACO and RCO are 
  undefined (i.e., the formulas produce 0/0).

The rest of this document describes the capabilities of SEGCALC.
Section 2 gives some general suggestions of how to structure 
the input data files used by SEGCALC. Section 3 contains further
details on the input data files and defines which data lines are
considered valid and processed by SEGCALC. Section 4 describes the 
set of permissible user inputs and how SEGCALC responds to those.
Section 5 lists the possible error messages and warnings of SEGCALC.
Finally, Section 6 states the operating system requirements to use 
SEGCALC.



2. RECOMMENDED INPUT DATA FILE CONFIGURATION

To compute ALL indices in SEGCALC it is advisable (but not essential) to 
set up your raw data file (ASCII format) as follows:
 Column 1: unique identifier for row or record (e.g. census tract code or 
  name). This column is not used in the computations but is recommended as 
  record identifier.
 Column 2 (referred to as I6 below): Area of the census tract (note: SEGALC 
  assumes that the area of the city is the sum of all census tract areas).
  If you are not using the area-based indices (DEL, ACO, RCO, ACE) you can 
  omit this column.
 Column 3 (referred to as I7 below): the geographic x-coordinate of the 
  census tract centroid  (e.g. metres or km). 
  Note: The distance-based indices (PCC, RCE, ACE, ACL, RCL, SP, Dpxy, DPxx) 
  are sensitive to unit of measurement. 
  If you are not using indices based on distances (from each other or from 
  the central business district, CBD) you can omit columns 3-6.
 Column 4 (referred to as I8 below): the geographic y-coordinate of the 
  census tract centroid (same units as x coordinate above).
  If you are not using indices (PCC, RCE, ACE, ACL, RCL, SP, Dpxy, DPxx) 
  based on distances (from each other or from the CDB) you can omit 
  columns 3-6.
 Column 5 (referred to as I9 below): the x coordinate of the central 
  business district (CBD) or city Centre  (must be same unit of measure as 
  the tract coordinates). If you are not using indices (PCC, RCE, ACE, ACL, 
  RCL, SP, Dpxy, DPxx) based on distances (from each other or from the CDB) 
  you can omit columns 3-6.
 Column 6 (referred to as I10 below): the y coordinate of the central 
  business district or city centre
  (must be same unit of measure as the tract coordinates). 
  If you are not using indices (PCC, RCE, ACE, ACL, RCL, SP, Dpxy, DPxx) 
  based on distances (from each other or from the CDB) you can omit 
  columns 3-6.
 Column 7 (referred to as I5 below): First column of data (i.e. population 
  segment such as age group, ethnic group, etc.)
 Columns 8 to LAST column: remaining population segments. Note: the number 
  of population segment columns (e.g. 7 to LAST) plus the number of ID and
  geographic reference columns (e.g. cols 1 thru 6) = the total number of
  columns in the input data file (referred to as I4 below)

IMPORTANT:

SEGCALC indices are all based on dichotomous analysis: i.e., segment vs. 
non-segment , where non-segment is defined as the total tract population 
minus the segment. E.g.,
 if segment = Seniors, non-segment = total tract population minus Seniors.
 if segment = Blacks,  non-segment = total tract population minus Blacks.

SEGCALC does not make provision for reading (from the raw data input file) a 
separate column that contains the total tract populations. SEGCALC computes 
the total tract population as the sum of the individual segments (e.g. 
cols 7 to LAST).  This means that if your analysis is only concerned with 
one population segment, you can simplify the input data file by simply 
entering two columns of data: the first for the population of that segment, 
and the second as the difference between total population and that segment.
In all cases, AT LEAST TWO segment data columns are required.

SEGCALC also requires complete (valid) records. Records containing missing 
data will not be processed. 

EXAMPLE DATA FILE: (note that fixed field format not required)
Ctname  Area(km2)  CTX  CTY  CBDX CBDY  SEGMENT1 SEGMENT2...SEGMENTM
100       8        1.0  10.0  5    6     300       700   ...   xxx
101      15        4.5  10.5  5    6    1000      5000   ...   xxx
 .       .          .    .    .    .      .         .           .
 .       .          .    .    .    .      .         .           .
 .       .          .    .    .    .      .         .           .
200      21        3.5  1.5   5    6     600      2000   ...   xxx



3. TECHNICAL INPUT DATA FILE SPECIFICATIONS

SEGCALC processes tabular data that describe the geographical distribution
of the population groups in a given city (see table below). The city in
question consists of N areal units and its population is made up of M
different groups. 

     g11  g12 ..... g1L   p11  p12 ..... p1M
     g21  g22 ..... g2L   p21  p22 ..... p2M
     ...  ... ..... ...   ...  ... ..... ...
     ...  ... ..... ...   ...  ... ..... ...
     gN1  gN2 ..... gNL   pN1  pN2 ..... pNM

The rows of the table represent the city's areal units. Each row i consists
of two components:
- The geography component (referred to as G-component in the sequel) with
  elements gi1, gi2,..., giL that represent data about the geography of 
  the areal unit. For example, a typical G-component consists of 5 numbers:
  (a) the x-coordinate of the unit's centre, (b) the y-coordinate of the 
  unit's centre, (c) the land area of the unit, (d) the x-coordinate of
  the central business district, and (e) the y-coordinate of the central
  business district. Note that the pair of numbers in (d) and (e) would be 
  the same in every areal unit --their purpose being to compute the distance
  between the unit's centre and the centre of the central business district.
- The population component (referred to as P-component in the sequel) with
  elements pi1, pi2,..., piM that represent the populations of the M
  groups living in the unit.

SEGCALC uses the following terminology and assumptions about the rows of
the table and the lines of the input file:

1. A data-line is a line of the input file that contains at least one
   non-whitespace character (whitespace characters are the space, the
   horizontal tab, and the new-line character).
2. Each row of the table occupies exactly one data-line of the input file.
3. The last row of the table corresponds to the data-line that
   immediately precedes the first non-data-line that appears in the file 
   (if such a line exists), or the end-of-file character. 
4. The elements of a data-line are delimited by at least one whitespace
   character. 
5. In every row, the elements of the G-component precede the elements of
   the P-component.

SEGCALC reads the lines of the input file until the end-of-file character
is encountered or the first non-data-line is read. Only the valid data-lines
are processed, however. The validity of a data-line depends on the user's 
specifications as follows. Near the beginning of its execution, SEGCALC
asks the user to enter the total number, L+M, of columns in the table, 
and the column number, L+1, of the first population group. Then, a
data-line is considered valid if and only if 
- the data-line consists of exactly L+M elements, and
- each element is a number, and
- each element of the last M columns is a non-negative integer, and
- the sum of the last M elements of the line is a positive integer (i.e
  the total population of the areal unit is at least one), and
- the land area of the row is positive, provided that the column
  of the land areas is included in the input file.
In any other case, the data-line is considered invalid and not processed 
by SEGCALC. For example, a data-line is invalid if it contains a letter,
or it contains more than L+M elements, or it consists of less than
L+M elements, or a number in column L+1 is negative, and so on.



4. USING SEGCALC

To  run SEGCALC, the user types "SEGCALC" at the operating system prompt.
Then, the following information is requested from the user:

I1. The name of the input file that contains the table of geographical data.
    SEGCALC will ask for another name if the file cannot be found.
I2. The name of the output file in which the results will be stored. If
    no name is given the results are only shown on the screen.
I3. The number of rows in the table (i.e., the number of areal units in
    the city). This number must be between 1 and 2000; otherwise SEGCALC
    will ask for a new number. You can enter a number in excess of the 
    actual number of records in your file. DO not enter a number less than
    the number of records in your file or SEGCALC will not process 
    these records.
I4. The total number of columns in the table (i.e., the number of elements
    in the G-component plus the number of elements in the P-component of
    each row). This number must be between 2 and 60; otherwise SEGCALC will
    ask for a new number. If the number is 2, the inputs I5 to I10 will 
    be omitted. (Note: the total number of columns must include all those
    which are placed before your first column of SEGMENT data).
I5. The column number, say F, of the first population group (segment) 
    in the table.
    This number must be between 1 and C-1, where C is the number given
    in I4; otherwise SEGCALC will ask for a new number. If F is 1, the 
    inputs I6--I10 are omitted.
I6. The number of the column that contains the sizes of the land areas in
    the city. This number must be between 0 and F-1, where F is the number 
    given in I5; otherwise SEGCALC will ask for a new number.
    If the number is 0, it indicates that this column is not available. 
I7. The number of the column that contains the x-coordinates of the 
    areal units' centres. This number must be between 0 and F-1, where
    F is the number given in I5; otherwise SEGCALC will ask for a new
    number. If the number is 0, it indicates that this column is not
    available and the inputs I8-I10 are omitted.
I8. The number of the column that contains the y-coordinates of the 
    areal units' centres. This number must be between 0 and F-1, where
    F is the number given in I5; otherwise SEGCALC will ask for a new
    number. If the number is 0, it indicates that this column is not
    available and the inputs I9 and I10 are omitted.
I9. The number of the column that contains the x-coordinate of the 
    central business district. This number must be between 0 and F-1,
    where F is the number given in input I5. If the number is 0, it
    indicates that this column is not available and the input I10 
    is omitted.
I10 The number of the column that contains the y-coordinate of the 
    central business district. This number must be between 0 and F-1,
    where F is the number given in input I5. If the number is 0, it
    indicates that this column is not available. 

After obtaining the inputs I1--I10, SEGCALC reads  the input file and 
reports the number of data-lines read and the number of valid data-lines. 
The numbers of the invalid data-lines, if any, will be recorded in
the output file specified in I2, but will not be shown on the screen.
If the number of data-lines is greater than the number of rows N given in 
I3, SEGCALC will not process the data-lines that appear after the N-th line 
of the file. On the other hand, if the number of data-lines read is smaller
than N, SEGCALC will continue its execution.

If the total population of a group in the city is zero, SEGCALC reports
that this group will not be considered in any of the geographical
indices. Moreover, if the total land area of the city is zero, SEGCALC
will set the column number given in I6 to zero and will continue
its execution assuming that that column number is not available. 

After the input file has been read, the following menu is shown 
on the screen:

 ____________________________________________________________________________
|UNEVENNESS:                                                                 |
| 1. D (Dissimilarity Index)        2. GINI  (Gini Coefficient)              |
| 3. H (Entropy Index)              4. ATKIN (Atkinson's Index)              |
|EXPOSURE:                                                                   |
| 5. XPy (Interaction Index)        6. XPx (Isolation Index)                 |
| 7. V (Correlation Ratio)                                                   |
|CONCENTRATION:                                                              |
| 8. DEL (Delta Index)              9. ACO (Absolute Concentration Index)    |
|10. RCO (Relative Concentration Index)                                      |
|CENTRALIZATION:                                                             |
|11. PCC (Proportion Central City)                                           |
|12. ACE (Absolute Centr. Index)   13. RCE (Relative Centr. Index)           |
|CLUSTERING:                                                                 |
|14. ACL (Absolute Clust. Index)   15. SP (White's Index Spatial Proximity)  |
|16. RCL (Relative Clust. Index)                                             |
|17. DPxy (Morgan's Distance Decay Interaction Index)                        |
|18. DPxx (Morgan's Distance Decay Isolation Index)                          |
|____________________________________________________________________________|
|    19. compute ALL INDICES  |   20. process NEW file  |    21. QUIT        |
|_____________________________|_________________________|____________________|
 
 select operation (1 to 21): 

The user can enter a number from 1 to 18 to get the value of the
corresponding geographical index for every group in the table whose
total population is at least one. SEGCALC will report that an index
cannot be computed if it requires information which is not available
(e.g., when one of the numbers given in I6--I10 is zero). In this
case, no record is written in the output file.
If the Atkinson index is selected, the user is asked to give a list
of parameters for each computation of that index. To terminate the
list, a 0 or 1 is required. For the PCC index, a list of area numbers is
required that represent the city's central areas. It is assumed
that the areas are sorted in increasing order of their distance from
the CBD. The list will terminate when 0 is entered. The Atkinson
or PCC index will not be computed if the first input terminates
the list.

If 19 is selected, SEGCALC will compute all the indices for which the
required information is available. Before it begins the computation
it will ask the user to enter the lists  for the Atkinson index and 
for the PCC index (if there is sufficient information to compute it)
as described above.

If 20 is selected, SEGCALC will request again the sequence of inputs I1--I10 
in order to process a new data file. Finally, SEGCALC will terminate when 
the user selects 21.



5. ERROR MESSAGES AND WARNINGS

This section lists all the possible error messages and warnings of
SEGCALC.

"ERROR-0: could not open file"
The message occurs if the input file specified in I1 does not exist,
or the system cannot create the output file specified in I2.

"ERROR-2: out of space"
This message can occur in the beginning of SEGCALC's execution, if the 
number of data-lines in the input file is large and the system's memory 
is limited. In this case SEGCALC terminates. In rather rare cases, 
the message could appear if some intermediate computations require memory 
that is not available. Then, SEGCALC will not perform these computations 
and will return to the main menu.

"ERROR-3: illegal number"
The message occurs if a number is requested and the user enters
something that is not a number or it is a number out of the
expected range. In any case, the user will be asked to give a correct
number.

"ERROR-4: could not compute <xxx> index.
(the column numbers of the areas' centroid coordinates are zero.)"
The message occurs if the input I7 or I8 is zero and the user requested
to compute the index <xxx> based on that input. The values of <xxx>
can be: ACL, RCL, SP, DPxx*, DPxy*.

"ERROR-6: could not compute <xxx> index.
(the column number of the city's land areas is zero.)"
The message occurs if the input I6 is zero and the user requested
to compute the index <xxx> based on that input. The values of <xxx>
can be: DELTA, ACO, RCO, ACE.

"ERROR-7: could not compute <xxx> index.
(the column numbers of the CBD centroid coordinates are zero.)"
The message occurs if the input I9 or I10 is zero and the user requested
to compute the index <xxx> based on that input. The values of <xxx>
can be: ACE, RCE, PCC.

"WARNING-0: the total land area of the city is not a positive number.
No index based on the land areas can be computed."
This warning can occur after the user has entered inputs I1--I10 and
the input I6 is not zero. The indices that cannot be computed are 
those listed in ERROR-6.

"WARNING-1: the total population of column <x> is not a positive number.
This column will not be used in any of the operations."
This warning can occur after the user has entered inputs I1--I10.
Column <x> could be between the numbers given in I4 and I5.



6. SYSTEM REQUIREMENTS
...... to be completed ........



REFERENCES

Massey, D. S. and Denton, A. N. (1988) `The dimensions of residential
  segregation', Social Forces 67, 281--315.
Morgan, B. S. (1983) `A distance-decay based interaction index to
  measure residential segregation', Area 15, 211--217.
Ours....