SEGREGATION INDEX CALCULATOR (SEGCALC)
1. INTRODUCTION
Segregation Index Calculator (SEGCALC) was developed by Dr. Stavros
Konstantinidis (dept. of Mathematics and Statistics)and Dr. Ivan
Townshend (dept. of Geography) at the University of Lethbridge (Alberta)
during the months of
May and July of 1997. The motivation for this project was the need by
researchers to compute geographical indices for the purpose of evaluating
the degree of segregation of a city's population groups.
SEGCALC implements the following 18 indices:
(a) ACE (absolute centralization), (b) ACL (absolute clustering),
(c) ACO (absolute concentration), (d) Atkinson's,
(e) Delta, (f) Dissimilarity,
(g) DPxx* (distance decay isolation), (h) DPxy* (distance decay interaction),
(i) Entropy, (j) Eta squared (correlation ratio),
(k) Gini coefficient, (l) PCC (proportion central city),
(m) RCE (relative centralization), (n) RCL (relative clustering),
(o) RCO (relative concentration), (p) SP (spatial proximity),
(q) xP*y (isolation), (r) xP*y (interaction).
The formulas used to compute the above indices are taken from Massey
and Denton (1988) with the exception of DPxx* and DPxy* which are
implemented according to the correct formula given in Morgan (1983).
For the indices that use the contiguity between areal units as a
function of the distance between their centroid coordinates, SEGCALC
assumes that the contiguity between an areal unit and itself is equal
to 1 (see page 294 of Massey and Denton, 1988).
While testing the effectiveness of SEGCALC, the following problems were
identified (see also Townshend, Konstantinidis and Walker (in progress) for further details):
- It is possible that the ACL index produces negative values, despite
the claim of Massey and Denton that its values always range between
0 and 1.
- There are situations where the values of the indices ACO and RCO are
undefined (i.e., the formulas produce 0/0).
The rest of this document describes the capabilities of SEGCALC.
Section 2 gives some general suggestions of how to structure
the input data files used by SEGCALC. Section 3 contains further
details on the input data files and defines which data lines are
considered valid and processed by SEGCALC. Section 4 describes the
set of permissible user inputs and how SEGCALC responds to those.
Section 5 lists the possible error messages and warnings of SEGCALC.
Finally, Section 6 states the operating system requirements to use
SEGCALC.
2. RECOMMENDED INPUT DATA FILE CONFIGURATION
To compute ALL indices in SEGCALC it is advisable (but not essential) to
set up your raw data file (ASCII format) as follows:
Column 1: unique identifier for row or record (e.g. census tract code or
name). This column is not used in the computations but is recommended as
record identifier.
Column 2 (referred to as I6 below): Area of the census tract (note: SEGALC
assumes that the area of the city is the sum of all census tract areas).
If you are not using the area-based indices (DEL, ACO, RCO, ACE) you can
omit this column.
Column 3 (referred to as I7 below): the geographic x-coordinate of the
census tract centroid (e.g. metres or km).
Note: The distance-based indices (PCC, RCE, ACE, ACL, RCL, SP, Dpxy, DPxx)
are sensitive to unit of measurement.
If you are not using indices based on distances (from each other or from
the central business district, CBD) you can omit columns 3-6.
Column 4 (referred to as I8 below): the geographic y-coordinate of the
census tract centroid (same units as x coordinate above).
If you are not using indices (PCC, RCE, ACE, ACL, RCL, SP, Dpxy, DPxx)
based on distances (from each other or from the CDB) you can omit
columns 3-6.
Column 5 (referred to as I9 below): the x coordinate of the central
business district (CBD) or city Centre (must be same unit of measure as
the tract coordinates). If you are not using indices (PCC, RCE, ACE, ACL,
RCL, SP, Dpxy, DPxx) based on distances (from each other or from the CDB)
you can omit columns 3-6.
Column 6 (referred to as I10 below): the y coordinate of the central
business district or city centre
(must be same unit of measure as the tract coordinates).
If you are not using indices (PCC, RCE, ACE, ACL, RCL, SP, Dpxy, DPxx)
based on distances (from each other or from the CDB) you can omit
columns 3-6.
Column 7 (referred to as I5 below): First column of data (i.e. population
segment such as age group, ethnic group, etc.)
Columns 8 to LAST column: remaining population segments. Note: the number
of population segment columns (e.g. 7 to LAST) plus the number of ID and
geographic reference columns (e.g. cols 1 thru 6) = the total number of
columns in the input data file (referred to as I4 below)
IMPORTANT:
SEGCALC indices are all based on dichotomous analysis: i.e., segment vs.
non-segment , where non-segment is defined as the total tract population
minus the segment. E.g.,
if segment = Seniors, non-segment = total tract population minus Seniors.
if segment = Blacks, non-segment = total tract population minus Blacks.
SEGCALC does not make provision for reading (from the raw data input file) a
separate column that contains the total tract populations. SEGCALC computes
the total tract population as the sum of the individual segments (e.g.
cols 7 to LAST). This means that if your analysis is only concerned with
one population segment, you can simplify the input data file by simply
entering two columns of data: the first for the population of that segment,
and the second as the difference between total population and that segment.
In all cases, AT LEAST TWO segment data columns are required.
SEGCALC also requires complete (valid) records. Records containing missing
data will not be processed.
EXAMPLE DATA FILE: (note that fixed field format not required)
Ctname Area(km2) CTX CTY CBDX CBDY SEGMENT1 SEGMENT2...SEGMENTM
100 8 1.0 10.0 5 6 300 700 ... xxx
101 15 4.5 10.5 5 6 1000 5000 ... xxx
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
200 21 3.5 1.5 5 6 600 2000 ... xxx
3. TECHNICAL INPUT DATA FILE SPECIFICATIONS
SEGCALC processes tabular data that describe the geographical distribution
of the population groups in a given city (see table below). The city in
question consists of N areal units and its population is made up of M
different groups.
g11 g12 ..... g1L p11 p12 ..... p1M
g21 g22 ..... g2L p21 p22 ..... p2M
... ... ..... ... ... ... ..... ...
... ... ..... ... ... ... ..... ...
gN1 gN2 ..... gNL pN1 pN2 ..... pNM
The rows of the table represent the city's areal units. Each row i consists
of two components:
- The geography component (referred to as G-component in the sequel) with
elements gi1, gi2,..., giL that represent data about the geography of
the areal unit. For example, a typical G-component consists of 5 numbers:
(a) the x-coordinate of the unit's centre, (b) the y-coordinate of the
unit's centre, (c) the land area of the unit, (d) the x-coordinate of
the central business district, and (e) the y-coordinate of the central
business district. Note that the pair of numbers in (d) and (e) would be
the same in every areal unit --their purpose being to compute the distance
between the unit's centre and the centre of the central business district.
- The population component (referred to as P-component in the sequel) with
elements pi1, pi2,..., piM that represent the populations of the M
groups living in the unit.
SEGCALC uses the following terminology and assumptions about the rows of
the table and the lines of the input file:
1. A data-line is a line of the input file that contains at least one
non-whitespace character (whitespace characters are the space, the
horizontal tab, and the new-line character).
2. Each row of the table occupies exactly one data-line of the input file.
3. The last row of the table corresponds to the data-line that
immediately precedes the first non-data-line that appears in the file
(if such a line exists), or the end-of-file character.
4. The elements of a data-line are delimited by at least one whitespace
character.
5. In every row, the elements of the G-component precede the elements of
the P-component.
SEGCALC reads the lines of the input file until the end-of-file character
is encountered or the first non-data-line is read. Only the valid data-lines
are processed, however. The validity of a data-line depends on the user's
specifications as follows. Near the beginning of its execution, SEGCALC
asks the user to enter the total number, L+M, of columns in the table,
and the column number, L+1, of the first population group. Then, a
data-line is considered valid if and only if
- the data-line consists of exactly L+M elements, and
- each element is a number, and
- each element of the last M columns is a non-negative integer, and
- the sum of the last M elements of the line is a positive integer (i.e
the total population of the areal unit is at least one), and
- the land area of the row is positive, provided that the column
of the land areas is included in the input file.
In any other case, the data-line is considered invalid and not processed
by SEGCALC. For example, a data-line is invalid if it contains a letter,
or it contains more than L+M elements, or it consists of less than
L+M elements, or a number in column L+1 is negative, and so on.
4. USING SEGCALC
To run SEGCALC, the user types "SEGCALC" at the operating system prompt.
Then, the following information is requested from the user:
I1. The name of the input file that contains the table of geographical data.
SEGCALC will ask for another name if the file cannot be found.
I2. The name of the output file in which the results will be stored. If
no name is given the results are only shown on the screen.
I3. The number of rows in the table (i.e., the number of areal units in
the city). This number must be between 1 and 2000; otherwise SEGCALC
will ask for a new number. You can enter a number in excess of the
actual number of records in your file. DO not enter a number less than
the number of records in your file or SEGCALC will not process
these records.
I4. The total number of columns in the table (i.e., the number of elements
in the G-component plus the number of elements in the P-component of
each row). This number must be between 2 and 60; otherwise SEGCALC will
ask for a new number. If the number is 2, the inputs I5 to I10 will
be omitted. (Note: the total number of columns must include all those
which are placed before your first column of SEGMENT data).
I5. The column number, say F, of the first population group (segment)
in the table.
This number must be between 1 and C-1, where C is the number given
in I4; otherwise SEGCALC will ask for a new number. If F is 1, the
inputs I6--I10 are omitted.
I6. The number of the column that contains the sizes of the land areas in
the city. This number must be between 0 and F-1, where F is the number
given in I5; otherwise SEGCALC will ask for a new number.
If the number is 0, it indicates that this column is not available.
I7. The number of the column that contains the x-coordinates of the
areal units' centres. This number must be between 0 and F-1, where
F is the number given in I5; otherwise SEGCALC will ask for a new
number. If the number is 0, it indicates that this column is not
available and the inputs I8-I10 are omitted.
I8. The number of the column that contains the y-coordinates of the
areal units' centres. This number must be between 0 and F-1, where
F is the number given in I5; otherwise SEGCALC will ask for a new
number. If the number is 0, it indicates that this column is not
available and the inputs I9 and I10 are omitted.
I9. The number of the column that contains the x-coordinate of the
central business district. This number must be between 0 and F-1,
where F is the number given in input I5. If the number is 0, it
indicates that this column is not available and the input I10
is omitted.
I10 The number of the column that contains the y-coordinate of the
central business district. This number must be between 0 and F-1,
where F is the number given in input I5. If the number is 0, it
indicates that this column is not available.
After obtaining the inputs I1--I10, SEGCALC reads the input file and
reports the number of data-lines read and the number of valid data-lines.
The numbers of the invalid data-lines, if any, will be recorded in
the output file specified in I2, but will not be shown on the screen.
If the number of data-lines is greater than the number of rows N given in
I3, SEGCALC will not process the data-lines that appear after the N-th line
of the file. On the other hand, if the number of data-lines read is smaller
than N, SEGCALC will continue its execution.
If the total population of a group in the city is zero, SEGCALC reports
that this group will not be considered in any of the geographical
indices. Moreover, if the total land area of the city is zero, SEGCALC
will set the column number given in I6 to zero and will continue
its execution assuming that that column number is not available.
After the input file has been read, the following menu is shown
on the screen:
____________________________________________________________________________
|UNEVENNESS: |
| 1. D (Dissimilarity Index) 2. GINI (Gini Coefficient) |
| 3. H (Entropy Index) 4. ATKIN (Atkinson's Index) |
|EXPOSURE: |
| 5. XPy (Interaction Index) 6. XPx (Isolation Index) |
| 7. V (Correlation Ratio) |
|CONCENTRATION: |
| 8. DEL (Delta Index) 9. ACO (Absolute Concentration Index) |
|10. RCO (Relative Concentration Index) |
|CENTRALIZATION: |
|11. PCC (Proportion Central City) |
|12. ACE (Absolute Centr. Index) 13. RCE (Relative Centr. Index) |
|CLUSTERING: |
|14. ACL (Absolute Clust. Index) 15. SP (White's Index Spatial Proximity) |
|16. RCL (Relative Clust. Index) |
|17. DPxy (Morgan's Distance Decay Interaction Index) |
|18. DPxx (Morgan's Distance Decay Isolation Index) |
|____________________________________________________________________________|
| 19. compute ALL INDICES | 20. process NEW file | 21. QUIT |
|_____________________________|_________________________|____________________|
select operation (1 to 21):
The user can enter a number from 1 to 18 to get the value of the
corresponding geographical index for every group in the table whose
total population is at least one. SEGCALC will report that an index
cannot be computed if it requires information which is not available
(e.g., when one of the numbers given in I6--I10 is zero). In this
case, no record is written in the output file.
If the Atkinson index is selected, the user is asked to give a list
of parameters for each computation of that index. To terminate the
list, a 0 or 1 is required. For the PCC index, a list of area numbers is
required that represent the city's central areas. It is assumed
that the areas are sorted in increasing order of their distance from
the CBD. The list will terminate when 0 is entered. The Atkinson
or PCC index will not be computed if the first input terminates
the list.
If 19 is selected, SEGCALC will compute all the indices for which the
required information is available. Before it begins the computation
it will ask the user to enter the lists for the Atkinson index and
for the PCC index (if there is sufficient information to compute it)
as described above.
If 20 is selected, SEGCALC will request again the sequence of inputs I1--I10
in order to process a new data file. Finally, SEGCALC will terminate when
the user selects 21.
5. ERROR MESSAGES AND WARNINGS
This section lists all the possible error messages and warnings of
SEGCALC.
"ERROR-0: could not open file"
The message occurs if the input file specified in I1 does not exist,
or the system cannot create the output file specified in I2.
"ERROR-2: out of space"
This message can occur in the beginning of SEGCALC's execution, if the
number of data-lines in the input file is large and the system's memory
is limited. In this case SEGCALC terminates. In rather rare cases,
the message could appear if some intermediate computations require memory
that is not available. Then, SEGCALC will not perform these computations
and will return to the main menu.
"ERROR-3: illegal number"
The message occurs if a number is requested and the user enters
something that is not a number or it is a number out of the
expected range. In any case, the user will be asked to give a correct
number.
"ERROR-4: could not compute index.
(the column numbers of the areas' centroid coordinates are zero.)"
The message occurs if the input I7 or I8 is zero and the user requested
to compute the index based on that input. The values of
can be: ACL, RCL, SP, DPxx*, DPxy*.
"ERROR-6: could not compute index.
(the column number of the city's land areas is zero.)"
The message occurs if the input I6 is zero and the user requested
to compute the index based on that input. The values of
can be: DELTA, ACO, RCO, ACE.
"ERROR-7: could not compute index.
(the column numbers of the CBD centroid coordinates are zero.)"
The message occurs if the input I9 or I10 is zero and the user requested
to compute the index based on that input. The values of
can be: ACE, RCE, PCC.
"WARNING-0: the total land area of the city is not a positive number.
No index based on the land areas can be computed."
This warning can occur after the user has entered inputs I1--I10 and
the input I6 is not zero. The indices that cannot be computed are
those listed in ERROR-6.
"WARNING-1: the total population of column is not a positive number.
This column will not be used in any of the operations."
This warning can occur after the user has entered inputs I1--I10.
Column could be between the numbers given in I4 and I5.
6. SYSTEM REQUIREMENTS
...... to be completed ........
REFERENCES
Massey, D. S. and Denton, A. N. (1988) `The dimensions of residential
segregation', Social Forces 67, 281--315.
Morgan, B. S. (1983) `A distance-decay based interaction index to
measure residential segregation', Area 15, 211--217.
Ours....