...
The
...
AVS
...
addresses
...
are
...
imported
...
into
...
EAS
...
via
...
a
...
one
...
time
...
ETL.
...
Once
...
EAS
...
goes
...
live,
...
this
...
ETL
...
will
...
be
...
turned
...
off.
...
Here
...
I
...
describe
...
in
...
general
...
the
...
processing
...
that
...
occurs
...
during
...
the
...
ETL.
...
The
...
ETL
...
process
...
is
...
controlled
...
by
...
a
...
python
...
job
...
which
...
is
...
here
...
(jira
...
access
...
required)
...
...
If
...
you
...
do
...
not
...
have
...
access,
...
the
...
important
...
code
...
is
...
here
...
Code Block |
---|
job.commands = [
# load the data
FtpCommand(job=job, connectionGroup=self.connectionOrganizer.ftpConnections['DBI2SFGIS'], sourcePath='/dbi-external/tblAVS.txt', targetPath=csvFileIn, direction='GET'),
FmeCommand(job=job, fmw="avs_load/avs_to_eas.fmw", connectionGroups=(self.connectionGroups['MAD_ETL'],), useSdePorts=(False,), files=(csvFileIn,)),
FmeCommand(job=job, fmw="avs_load/parcels_sfgisStg_no_geom_to_eas.fmw", connectionGroups=(self.connectionGroups['SFGIS_STG_SDE'], self.connectionGroups['MAD_ETL'],), useSdePorts=(True,False,)),
SqlCommand(job=job, sqlStatement="select avs.delete_addresses();", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="vacuum analyze;", connectionGroup=self.connectionGroups['MAD_ETL'], useTransaction=False),
SqlCommand(job=job, sqlStatement="select avs.init_parcel_points();", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="select avs.init_streets_nearest();", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="select avs.init_avs_addresses();", connectionGroup=self.connectionGroups['MAD_ETL']),
AvsLoadCommand(job=job, connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="update avs.avs_addresses set exception_text = 'programming error - row not processed' where address_base_id is null and exception_text is null;", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="select public._eas_validate_addresses_after_etl('ALL');", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="delete from public.d_address_disposition where disposition_description = 'provisional';", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="vacuum analyze;", connectionGroup=self.connectionGroups['MAD_ETL'], useTransaction=False),
# report on results
RemoveDirectoryTreeCommand(job=job, targetDir=os.path.join(getDataFilePath(), 'avs_load_report')),
SqlCommand(job=job, sqlStatement="select avs.qa();", connectionGroup=self.connectionGroups['MAD_ETL']),
DumpTablesCommand(job=job, connectionGroup=self.connectionGroups['MAD_ETL'], tableNames=['avs.vw_load_summary', 'avs.vw_load_results', 'avs.qa'], targetDir=reportDir),
ZipCommand(job=job, sourceDir=os.path.join(getDataFilePath(), 'avs_load_report'), targetFile=os.path.join(getDataFilePath(), 'avs_load_report.zip')),
EmailCommand(job=job, subject='AVS Load Report', text='See attached.', attachFiles=[os.path.join(getDataFilePath(), 'avs_load_report.zip')]),
FtpCommand(job=job, connectionGroup=self.connectionOrganizer.ftpConnections['DBI2SFGIS'], sourcePath=os.path.join(reportDir, 'avs.vw_load_results.csv'), targetPath='/dbi-external/avsEasLoadResults.csv', direction='PUT')
]
{code}
|
This
...
python
...
code
...
calls
...
into
...
a
...
DB
...
proc
...
(avs.load)
...
which
...
then
...
does
...
the
...
heavy
...
lifting.
...
The
...
majority
...
of
...
the
...
other
...
code
...
is
...
here
...
...
with
...
these
...
being
...
the
...
key
...
pieces
...
...
...
...
...
...
...
The
...
execution
...
path
...
is
...
not
...
trivial
...
so
...
I'll
...
walk
...
through
...
the
...
process
...
in
...
english
...
(below).
...
By
...
necessity
...
I
...
do
...
a
...
lot
...
of
...
generalizing
...
but
...
I
...
also
...
link
...
to
...
the
...
code.
...
In
...
case
...
you
...
are
...
curious,
...
the
...
key
...
parts
...
of
...
the
...
EAS
...
data
...
model
...
are
...
here
...
...
OK,
...
off
...
we
...
go...
...
We
...
start
...
by
...
doing
...
to
...
blanket
...
validations
...
and
...
standardizing
...
some
...
values.
...
This
...
is
...
done
...
in
...
this
...
db
...
proc:
...
...
and
...
includes
...
the
...
following
...
'invalid
...
street
...
number
...
suffix'
...
The
...
domain
...
values
...
are
...
here
...
...
'street
...
name
...
does
...
not
...
exist'
...
The
...
domain
...
values
...
are
...
specified
...
by
...
DPW.
...
We
...
do
...
not
...
use
...
fuzzy
...
string
...
matching.
...
'street
...
suffix
...
does
...
not
...
exist
...
in
...
street
...
dataset'
...
The
...
domain
...
values
...
are
...
specified
...
by
...
DPW.
...
We
...
do
...
not
...
use
...
fuzzy
...
string
...
matching.
...
'street
...
-
...
street
...
suffix
...
combination
...
does
...
not
...
exist'
...
The
...
domain
...
values
...
are
...
specified
...
by
...
DPW.
...
We
...
do
...
not
...
use
...
fuzzy
...
string
...
matching.
...
'referenced
...
parcel
...
has
...
no
...
geometry'
...
parcel
...
data
...
from
...
DPW
...
'no
...
matching
...
block
...
-
...
lot'
...
parcel
...
data
...
from
...
DPW
...
'block
...
lot
...
values
...
are
...
inconsistent'
...
Check
...
for
...
consistency
...
across
...
the
...
columnes
...
block,
...
lot,
...
and
...
block-lot.
...
'length
...
of
...
concatenated
...
unit
...
num
...
exceeds
...
10'
...
We
...
concatenate
...
avsa.unit
...
and
...
avsa.unit_sfx
...
using
...
http://code.google.com/p/eas/source/browse/trunk/etl/sql/avs_load/f_concatenate_unit_ddl.sql
...
The
...
result
...
must
...
fit
...
into
...
char
...
10.
...
'invalid
...
end
...
date
...
value'
...
Some
...
of
...
the
...
end
...
date
...
values
...
cannot
...
be
...
cats
...
into
...
the
...
date
...
type.
...
We
...
also
...
standardize
...
dates,
...
"unit
...
number"
...
values,
...
and
...
street
...
number
...
suffixes.
...
At
...
this
...
point
...
we
...
have
...
"excepted"
...
addresses
...
that
...
we
...
know
...
we
...
cannot
...
process.
...
We
...
take
...
each
...
remaining
...
address
...
and
...
try
...
to
...
load
...
it
...
into
...
EAS.
...
This
...
proceeds
...
from
...
the
...
base
...
address,
...
to
...
the
...
unit
...
address,
...
to
...
the
...
unit
...
address
...
-
...
parcel
...
link.
...
This
...
is
...
detailed
...
here
...
in
...
...
The
...
main
...
load
...
proc
...
calls
...
into
...
these
...
procs
...
in
...
this
...
order
...
...
...
...
With
...
the
...
most
...
interesting
...
work
...
being
...
done
...
by
...
f_process_address_base.
...
I
...
will
...
describe
...
this
...
now
...
and
...
will
...
do
...
a
...
good
...
bit
...
of
...
generalizing.
...
We
...
try
...
to
...
use
...
an
...
existing
...
adddress
...
and
...
create
...
a
...
new
...
one
...
if
...
necessary.
...
If
...
we
...
create
...
a
...
new
...
one,
...
we
...
have
...
to
...
find
...
the
...
best
...
matching
...
street
...
segment.
...
This
...
is
...
easily
...
the
...
most
...
complicated
...
process.
...
The
...
segment
...
must
...
be
...
within
...
500
...
feet
...
of
...
the
...
specified
...
parcel,
...
and
...
the
...
street
...
name
...
and
...
street
...
suffix
...
must
...
match
...
the
...
source
...
data.
...
Again,
...
we
...
do
...
not
...
use
...
fuzzy
...
string
...
matching
...
(such
...
as
...
Levenshtein
...
etc).
...
If
...
all
...
goes
...
well,
...
we
...
insert
...
the
...
base
...
address.
...
There
...
are
...
various
...
constraints
...
that
...
protect
...
data
...
integrity,
...
specifically
...
to
...
prevent
...
duplicate
...
active
...
base
...
addresses.
...
You
...
can
...
see
...
the
...
trigger
...
enforced
...
constraints
...
here
...
...
If
...
the
...
ETL
...
finds
...
or
...
creates
...
a
...
base
...
address,
...
it
...
will
...
then
...
insert
...
the
...
unit
...
address
...
if
...
it
...
is
...
specified.
...
Again,
...
we
...
do
...
not
...
allow
...
multiple
...
active
...
duplicates.
...
...
Finally,
...
we
...
insert
...
an
...
"address
...
-
...
parcel
...
link".
...
Here
...
again
...
we
...
do
...
not
...
allow
...
duplicates.
...
...
At
...
the
...
end
...
of
...
all
...
this
...
processing
...
we
...
report
...
results
...
in
...
three
...
ways
...
- summary
...
- results
...
- detail
...
- on
...
- all
...
- AVS
...
- rows
...
- quality
...
- assurance
...
- on
...
- selected
...
- data
...
An
...
example
...
of
...
this
...
attached
...
here:
...
...
The
...
summary
...
results
...
are
...
shown
...
in
...
the
...
table
...
below.
...
These
...
results
...
are
...
from
...
2011-12-07.
...
The
...
first
...
row
...
represents
...
rows
...
from
...
AVS
...
that
...
were
...
successfully
...
inserted
...
from
...
AVS.
...
Wiki Markup |
---|
{csv}
exception_text_general,count,percentage
,324018,96.3671
referenced parcel has no geometry,5409,1.6087
no matching block - lot,3926,1.1676
insert rejected - address_x_parcels - the parcel is not within the block number range,1172,0.3486
your street suffix appears to be incorrect ,929,0.2763
unable to find nearby street ,357,0.1062
street name does not exist,321,0.0955
insert rejected - address_x_parcels - the parcel is not within within 500 feet of the address,87,0.0259
length of concatenated unit num exceeds 10,5,0.0015
block lot values are inconsistent,4,0.0012
street suffix does not exist in street dataset,3,0.0009
invalid end date value,2,0.0006
{csv} |
...