[Openspace] Spatial Poisson Regression
mburkey at triad.rr.com
Sun Jul 18 12:25:08 CDT 2004
I am aware that at the present time that correctly estimating a model with a
count dependent variable is not possible. I want to get some feedback on a
process I used, to ensure that my assumptions are correct, and the process
and conclusions seem reasonable. I appreciate all feedback!
Basic setup: Descriptive regression on the number of locations of various
types of businesses using approximately 800 Zip Code Tabulation Areas.
Explanatory variables such as population, race, income, etc. are used.
1) I estimated a Poisson model (MLE with log link) without including lagged
values of Y. ASSUMPTION: If the true value of the spatial coefficient p in
pWy is nonzero, this could cause omitted variable bias in the B's of the
2) To check for bias, I included Wy (calculated using GeoDa) as an
explanatory variable in a poison model (estimated outside of GeoDa).
ASSUMPTION: By doing this I think my coefficient estimates will be unbiased,
but the standard errors will not be computed correctly. Is this correct?
3) When I compared the results from #1 and #2 above, the coefficients were
very similar. Therefore I concluded that though there is likely spatial
autocorrelation, omitting the lagged values did not appear to significantly
bias the coefficients.
4) As a final check, I followed a suggestion from Cameron and Trivedi. They
suggest that if you must estimate a count data model that can't be estimated
properly with existing software, to run a log-linear model with the ad-hoc
solution of converting all y to (y+1) or converting all zeros to 0.5. So, I
ran several variants of this model in GeoDa. The Spatial Lag model seemed
the best specification, and once again, the signs, size, and significance of
the coefficients on the major explanatory variables were in the same
ballpark as with the other models.
In a paper on this work, I included items 1,2, and 3 above, but omitted
discussion of item 4.
Are my assumptions, conclusions, and process above reasonable? What better
suggestions are there for working with spatial count data? Any references
on the subject?
Thank you. If anyone would like to see the paper, feel free to email me
Mark L. Burkey
burkeym at ncat.edu
More information about the Openspace