Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

The

...

AVS

...

addresses

...

are

...

imported

...

into

...

EAS

...

via

...

a

...

one

...

time

...

ETL.

...


Once

...

EAS

...

goes

...

live,

...

this

...

ETL

...

will

...

be

...

turned

...

off.

...


Here

...

I

...

describe

...

in

...

general

...

the

...

processing

...

that

...

occurs

...

during

...

the

...

ETL.

...

The

...

ETL

...

process

...

is

...

controlled

...

by

...

a

...

python

...

job

...

which

...

is

...

here

...

(jira

...

access

...

required)

...

...

If

...

you

...

do

...

not

...

have

...

access,

...

the

...

important

...

code

...

is

...

here

...

}
Code Block
        job.commands = [
            # load the data
            FtpCommand(job=job, connectionGroup=self.connectionOrganizer.ftpConnections['DBI2SFGIS'], sourcePath='/dbi-external/tblAVS.txt', targetPath=csvFileIn, direction='GET'),
            FmeCommand(job=job, fmw="avs_load/avs_to_eas.fmw", connectionGroups=(self.connectionGroups['MAD_ETL'],), useSdePorts=(False,), files=(csvFileIn,)),
            FmeCommand(job=job, fmw="avs_load/parcels_sfgisStg_no_geom_to_eas.fmw", connectionGroups=(self.connectionGroups['SFGIS_STG_SDE'], self.connectionGroups['MAD_ETL'],), useSdePorts=(True,False,)),
            SqlCommand(job=job, sqlStatement="select avs.delete_addresses();", connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="vacuum analyze;", connectionGroup=self.connectionGroups['MAD_ETL'], useTransaction=False),
            SqlCommand(job=job, sqlStatement="select avs.init_parcel_points();", connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="select avs.init_streets_nearest();", connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="select avs.init_avs_addresses();", connectionGroup=self.connectionGroups['MAD_ETL']),
            AvsLoadCommand(job=job, connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="update avs.avs_addresses set exception_text = 'programming error - row not processed' where address_base_id is null and exception_text is null;", connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="select public._eas_validate_addresses_after_etl('ALL');", connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="delete from public.d_address_disposition where disposition_description = 'provisional';", connectionGroup=self.connectionGroups['MAD_ETL']),
            SqlCommand(job=job, sqlStatement="vacuum analyze;", connectionGroup=self.connectionGroups['MAD_ETL'], useTransaction=False),
            # report on results
            RemoveDirectoryTreeCommand(job=job, targetDir=os.path.join(getDataFilePath(), 'avs_load_report')),
            SqlCommand(job=job, sqlStatement="select avs.qa();", connectionGroup=self.connectionGroups['MAD_ETL']),
            DumpTablesCommand(job=job, connectionGroup=self.connectionGroups['MAD_ETL'], tableNames=['avs.vw_load_summary', 'avs.vw_load_results', 'avs.qa'], targetDir=reportDir),
            ZipCommand(job=job, sourceDir=os.path.join(getDataFilePath(), 'avs_load_report'), targetFile=os.path.join(getDataFilePath(), 'avs_load_report.zip')),
            EmailCommand(job=job, subject='AVS Load Report', text='See attached.', attachFiles=[os.path.join(getDataFilePath(), 'avs_load_report.zip')]),
            FtpCommand(job=job, connectionGroup=self.connectionOrganizer.ftpConnections['DBI2SFGIS'], sourcePath=os.path.join(reportDir, 'avs.vw_load_results.csv'), targetPath='/dbi-external/avsEasLoadResults.csv', direction='PUT')
        ]

{code}

This

...

python

...

code

...

calls

...

into

...

a

...

DB

...

proc

...

(avs.load)

...

which

...

then

...

does

...

the

...

heavy

...

lifting.

...

The

...

majority

...

of

...

the

...

other

...

code

...

is

...

here

...

...

with

...

these

...

being

...

the

...

key

...

pieces

...

...

...

...

...

...

...

The

...

execution

...

path

...

is

...

not

...

trivial

...

so

...

I'll

...

walk

...

through

...

the

...

process

...

in

...

english

...

(below).

...


By

...

necessity

...

I

...

do

...

a

...

lot

...

of

...

generalizing

...

but

...

I

...

also

...

link

...

to

...

the

...

code.

...

In

...

case

...

you

...

are

...

curious,

...

the

...

key

...

parts

...

of

...

the

...

EAS

...

data

...

model

...

are

...

here

...

...

OK,

...

off

...

we

...

go...

...

We

...

start

...

by

...

doing

...

to

...

blanket

...

validations

...

and

...

standardizing

...

some

...

values.

...


This

...

is

...

done

...

in

...

this

...

db

...

proc:

...

...

and

...

includes

...

the

...

following

...

'invalid

...

street

...

number

...

suffix'

...


The

...

domain

...

values

...

are

...

here

...

...

'street

...

name

...

does

...

not

...

exist'

...


The

...

domain

...

values

...

are

...

specified

...

by

...

DPW.

...


We

...

do

...

not

...

use

...

fuzzy

...

string

...

matching.

...

'street

...

suffix

...

does

...

not

...

exist

...

in

...

street

...

dataset'

...


The

...

domain

...

values

...

are

...

specified

...

by

...

DPW.

...


We

...

do

...

not

...

use

...

fuzzy

...

string

...

matching.

...

'street

...

-

...

street

...

suffix

...

combination

...

does

...

not

...

exist'

...


The

...

domain

...

values

...

are

...

specified

...

by

...

DPW.

...


We

...

do

...

not

...

use

...

fuzzy

...

string

...

matching.

...

'referenced

...

parcel

...

has

...

no

...

geometry'

...


parcel

...

data

...

from

...

DPW

...

'no

...

matching

...

block

...

-

...

lot'

...


parcel

...

data

...

from

...

DPW

...

'block

...

lot

...

values

...

are

...

inconsistent'

...


Check

...

for

...

consistency

...

across

...

the

...

columnes

...

block,

...

lot,

...

and

...

block-lot.

...

'length

...

of

...

concatenated

...

unit

...

num

...

exceeds

...

10'

...


We

...

concatenate

...

avsa.unit

...

and

...

avsa.unit_sfx

...

using

...

http://code.google.com/p/eas/source/browse/trunk/etl/sql/avs_load/f_concatenate_unit_ddl.sql

...


The

...

result

...

must

...

fit

...

into

...

char

...

10.

...

'invalid

...

end

...

date

...

value'

...


Some

...

of

...

the

...

end

...

date

...

values

...

cannot

...

be

...

cats

...

into

...

the

...

date

...

type.

...

We

...

also

...

standardize

...

dates,

...

"unit

...

number"

...

values,

...

and

...

street

...

number

...

suffixes.

...

At

...

this

...

point

...

we

...

have

...

"excepted"

...

addresses

...

that

...

we

...

know

...

we

...

cannot

...

process.

...


We

...

take

...

each

...

remaining

...

address

...

and

...

try

...

to

...

load

...

it

...

into

...

EAS.

...


This

...

proceeds

...

from

...

the

...

base

...

address,

...

to

...

the

...

unit

...

address,

...

to

...

the

...

unit

...

address

...

-

...

parcel

...

link.

...


This

...

is

...

detailed

...

here

...

in

...

...

The

...

main

...

load

...

proc

...

calls

...

into

...

these

...

procs

...

in

...

this

...

order

...

...

...

...

With

...

the

...

most

...

interesting

...

work

...

being

...

done

...

by

...

f_process_address_base.

...


I

...

will

...

describe

...

this

...

now

...

and

...

will

...

do

...

a

...

good

...

bit

...

of

...

generalizing.

...


We

...

try

...

to

...

use

...

an

...

existing

...

adddress

...

and

...

create

...

a

...

new

...

one

...

if

...

necessary.

...


If

...

we

...

create

...

a

...

new

...

one,

...

we

...

have

...

to

...

find

...

the

...

best

...

matching

...

street

...

segment.

...


This

...

is

...

easily

...

the

...

most

...

complicated

...

process.

...


The

...

segment

...

must

...

be

...

within

...

500

...

feet

...

of

...

the

...

specified

...

parcel,

...

and

...

the

...

street

...

name

...

and

...

street

...

suffix

...

must

...

match

...

the

...

source

...

data.

...


Again,

...

we

...

do

...

not

...

use

...

fuzzy

...

string

...

matching

...

(such

...

as

...

Levenshtein

...

etc).

...

If

...

all

...

goes

...

well,

...

we

...

insert

...

the

...

base

...

address.

...


There

...

are

...

various

...

constraints

...

that

...

protect

...

data

...

integrity,

...

specifically

...

to

...

prevent

...

duplicate

...

active

...

base

...

addresses.

...


You

...

can

...

see

...

the

...

trigger

...

enforced

...

constraints

...

here

...

...

If

...

the

...

ETL

...

finds

...

or

...

creates

...

a

...

base

...

address,

...

it

...

will

...

then

...

insert

...

the

...

unit

...

address

...

if

...

it

...

is

...

specified.

...


Again,

...

we

...

do

...

not

...

allow

...

multiple

...

active

...

duplicates.

...

...

Finally,

...

we

...

insert

...

an

...

"address

...

-

...

parcel

...

link".

...


Here

...

again

...

we

...

do

...

not

...

allow

...

duplicates.

...

...

At

...

the

...

end

...

of

...

all

...

this

...

processing

...

we

...

report

...

results

...

in

...

three

...

ways

...

  • summary

...

  • results

...

  • detail

...

  • on

...

  • all

...

  • AVS

...

  • rows

...

  • quality

...

  • assurance

...

  • on

...

  • selected

...

  • data

...

An

...

example

...

of

...

this

...

attached

...

here:

...

avs_load_report.zip

...

The

...

summary

...

results

...

are

...

shown

...

in

...

the

...

table

...

below.

...


These

...

results

...

are

...

from

...

2011-12-07.

...


The

...

first

...

row

...

represents

...

rows

...

from

...

AVS

...

that

...

were

...

successfully

...

inserted

...

from

...

AVS.

...

Wiki Markup
{csv}
exception_text_general,count,percentage
,324018,96.3671
referenced parcel has no geometry,5409,1.6087
no matching block - lot,3926,1.1676
insert rejected - address_x_parcels - the parcel is not within the block number range,1172,0.3486
your street suffix appears to be incorrect ,929,0.2763
unable to find nearby street ,357,0.1062
street name does not exist,321,0.0955
insert rejected - address_x_parcels - the parcel is not within within 500 feet of the address,87,0.0259
length of concatenated unit num exceeds 10,5,0.0015
block lot values are inconsistent,4,0.0012
street suffix does not exist in street dataset,3,0.0009
invalid end date value,2,0.0006
{csv}

...