Statistical Analysis of Data on Traffic Accidents

I. Introduction

II. Methodology

III. Data Management

Summary Statistics and Explanatory data analysis:

--Internal factors may affect injury level

    Alcohol

    Age

    Gender

-- External factors may affect injury level

    Hour of the Day

    Number of the vehicles

    Light conditions

    Brand, Make, Year of the vehicles

    Region

Data Merging & Cleaning
Data Merging Steps
Outside data source:

--Car safety rating

We think that certain features and conditions of the vehicle might have an influence in the maximum injury level that could have occurred in an accident. Thus, we decide to add a car safety ratings column corresponding to the specific vehicle involved in the accident. We tried to pull data through the website of U.S. Department Transportation. We found the data from 1990 to 2014. But the data is raw so we need to do something to make it useful. There are so many different types of cars and thus a lot of needless information. For simplifying, we decided to focus on Year, Make and Model of a vehicle only. Therefore, we combined the rows that have the same year, make and model. Namely, the rest information of a car, such as drive type (AWD or FWD) will not be considered any more. After importing data, we use R to do the work.

--Car crash cost by state

In order to have an accurate estimate of the economic cost of a vehicle accident, several external datasets have been brought into our original GES dataset. From the State-Specific Cost of Crash Deaths Fact Sheets, we got the total cost of crash-related deaths in each state in the year of 2013. The second external data is called Fatal Crash Totals State by State, which we got the total number of fatal crashes in each state. After dividing the cost of fatal crash in each state by the total number of fatal crash, we created a new variable called cost per crash in 2013. We then took an average of cost per crash from all the states, and we use the average number as 1 in the new column called cost adjuster. The variable of cost adjust is the index that we used to estimate the cost of the rest of the crash types. What is more, we used a national average of the average cost of Type A crash, Type B crash, Type C crash and property damage only crash. With the national average cost and the cost index, we then got the estimated cost of each type of crash in each state.

IV. Statistical Analysis

Multinomial Logistic Regression (Survey Logistic):

--Reasons:

    The response variable is categorical with ordinal levels.

    Easy to interpret for each level (Compare to Decision Tree, Neural Network).

    Given survey data, we need to consider sample weights and sampling units.

--Explanatory Variables:

--Response Variable: Maximum injury severity (Binned)

--Parameter Estimates:

Close
 
Analysis of Maximum Likelihood Estimates
Parameter   SEV DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   Fatal 1 -7.4259 0.6789 119.6389 <.0001
Intercept   TypeA 1 -4.0718 0.3332 149.2945 <.0001
Intercept   TypeB 1 -2.8512 0.2386 142.7981 <.0001
Intercept   TypeC 1 -1.9481 0.1777 120.2266 <.0001
IMP_TRAV_SP   Fatal 1 0.0437 0.00301 210.7428 <.0001
IMP_TRAV_SP   TypeA 1 0.0250 0.00153 267.2686 <.0001
IMP_TRAV_SP   TypeB 1 0.0178 0.00164 116.7216 <.0001
IMP_TRAV_SP   TypeC 1 0.00391 0.00150 6.8083 0.0091
IMP_VE_TOTAL   Fatal 1 0.5447 0.1533 12.6253 0.0004
IMP_VE_TOTAL   TypeA 1 0.3339 0.0569 34.4964 <.0001
IMP_VE_TOTAL   TypeB 1 0.4735 0.0289 268.8647 <.0001
IMP_VE_TOTAL   TypeC 1 0.4740 0.0313 229.9680 <.0001
IMP_HOUR_IM   Fatal 1 -0.00891 0.00950 0.8813 0.3478
IMP_HOUR_IM   TypeA 1 -0.00696 0.00409 2.8951 0.0889
IMP_HOUR_IM   TypeB 1 -0.00122 0.00273 0.2006 0.6542
IMP_HOUR_IM   TypeC 1 0.00186 0.00329 0.3191 0.5721
IMP_VSPD_LIM   Fatal 1 -0.00523 0.0117 0.1990 0.6555
IMP_VSPD_LIM   TypeA 1 -0.00506 0.00298 2.8924 0.0890
IMP_VSPD_LIM   TypeB 1 -0.00915 0.00264 12.0560 0.0005
IMP_VSPD_LIM   TypeC 1 0.000573 0.00276 0.0429 0.8359
IMP_SEX_IM 2 Fatal 1 -0.2408 0.0642 14.0835 0.0002
IMP_SEX_IM 2 TypeA 1 -0.0249 0.0292 0.7296 0.3930
IMP_SEX_IM 2 TypeB 1 0.0959 0.0173 30.7440 <.0001
IMP_SEX_IM 2 TypeC 1 0.1885 0.0186 102.2626 <.0001
IMP_VSURCOND 0 Fatal 1 0.2439 0.4527 0.2902 0.5901
IMP_VSURCOND 0 TypeA 1 -0.0921 0.1729 0.2837 0.5943
IMP_VSURCOND 0 TypeB 1 -0.0945 0.0979 0.9312 0.3346
IMP_VSURCOND 0 TypeC 1 -0.2000 0.0985 4.1191 0.0424
IMP_VSURCOND 2 Fatal 1 -0.2961 0.1313 5.0893 0.0241
IMP_VSURCOND 2 TypeA 1 -0.2221 0.0766 8.4014 0.0037
IMP_VSURCOND 2 TypeB 1 -0.0709 0.0404 3.0721 0.0796
IMP_VSURCOND 2 TypeC 1 0.00447 0.0417 0.0115 0.9145
IMP_VSURCOND 3 Fatal 1 -0.7170 0.4807 2.2251 0.1358
IMP_VSURCOND 3 TypeA 1 -0.7073 0.1533 21.2966 <.0001
IMP_VSURCOND 3 TypeB 1 -0.5476 0.1804 9.2095 0.0024
IMP_VSURCOND 3 TypeC 1 -0.1994 0.1094 3.3245 0.0683
IMP_VSURCOND 4 Fatal 1 -0.2940 0.5883 0.2498 0.6172
IMP_VSURCOND 4 TypeA 1 -0.1082 0.2507 0.1864 0.6659
IMP_VSURCOND 4 TypeB 1 -0.1770 0.1842 0.9235 0.3365
IMP_VSURCOND 4 TypeC 1 -0.0963 0.1303 0.5460 0.4600
IMP_VSURCOND 5 Fatal 1 1.9986 0.9485 4.4399 0.0351
IMP_VSURCOND 5 TypeA 1 -0.1512 1.0079 0.0225 0.8807
IMP_VSURCOND 5 TypeB 1 1.7020 0.4147 16.8418 <.0001
IMP_VSURCOND 5 TypeC 1 -0.5725 0.6821 0.7045 0.4013
IMP_VSURCOND 6 Fatal 1 -7.6101 0.2264 1130.1624 <.0001
IMP_VSURCOND 6 TypeA 1 -0.4166 0.4835 0.7424 0.3889
IMP_VSURCOND 6 TypeB 1 -0.1424 0.2561 0.3090 0.5783
IMP_VSURCOND 6 TypeC 1 0.3034 0.2900 1.0945 0.2955
IMP_VSURCOND 7 Fatal 1 -9.8958 0.6651 221.3908 <.0001
IMP_VSURCOND 7 TypeA 1 3.0866 0.9464 10.6373 0.0011
IMP_VSURCOND 7 TypeB 1 2.7599 0.7791 12.5478 0.0004
IMP_VSURCOND 7 TypeC 1 2.1229 0.8017 7.0122 0.0081
IMP_VSURCOND 8 Fatal 1 0.5991 0.9389 0.4072 0.5234
IMP_VSURCOND 8 TypeA 1 -0.2379 0.4093 0.3378 0.5611
IMP_VSURCOND 8 TypeB 1 -1.0340 0.4175 6.1326 0.0133
IMP_VSURCOND 8 TypeC 1 0.1314 0.6673 0.0387 0.8440
IMP_VSURCOND 10 Fatal 1 0.2950 0.9060 0.1060 0.7448
IMP_VSURCOND 10 TypeA 1 0.5769 0.5274 1.1964 0.2740
IMP_VSURCOND 10 TypeB 1 -0.1235 0.4046 0.0932 0.7602
IMP_VSURCOND 10 TypeC 1 -0.3136 0.1813 2.9910 0.0837
IMP_VSURCOND 11 Fatal 1 0.9102 0.5553 2.6863 0.1012
IMP_VSURCOND 11 TypeA 1 0.9708 0.5045 3.7026 0.0543
IMP_VSURCOND 11 TypeB 1 0.3795 0.2049 3.4293 0.0641
IMP_VSURCOND 11 TypeC 1 -0.3508 0.4601 0.5814 0.4457
IMP_VSURCOND 9999 Fatal 1 -0.3257 0.5982 0.2964 0.5862
IMP_VSURCOND 9999 TypeA 1 -1.4482 0.3909 13.7292 0.0002
IMP_VSURCOND 9999 TypeB 1 -1.2454 0.2747 20.5476 <.0001
IMP_VSURCOND 9999 TypeC 1 -0.9124 0.1973 21.3917 <.0001
REGION 2 Fatal 1 0.2511 0.3193 0.6183 0.4317
REGION 2 TypeA 1 0.1046 0.1476 0.5024 0.4785
REGION 2 TypeB 1 0.3519 0.1923 3.3491 0.0672
REGION 2 TypeC 1 -0.4874 0.1403 12.0703 0.0005
REGION 3 Fatal 1 0.2416 0.2975 0.6597 0.4167
REGION 3 TypeA 1 0.2395 0.2151 1.2391 0.2656
REGION 3 TypeB 1 0.3612 0.2137 2.8575 0.0909
REGION 3 TypeC 1 -0.3170 0.1702 3.4708 0.0625
REGION 4 Fatal 1 0.7223 0.3218 5.0374 0.0248
REGION 4 TypeA 1 0.3549 0.1984 3.2008 0.0736
REGION 4 TypeB 1 0.5830 0.2089 7.7917 0.0052
REGION 4 TypeC 1 -0.0211 0.1456 0.0210 0.8849
IMP_SAFETY_RATING   Fatal 1 -0.1740 0.1058 2.7020 0.1002
IMP_SAFETY_RATING   TypeA 1 -0.2152 0.0320 45.1180 <.0001
IMP_SAFETY_RATING   TypeB 1 -0.1682 0.0270 38.7430 <.0001
IMP_SAFETY_RATING   TypeC 1 -0.1235 0.0283 19.0761 <.0001
IMP_LAND_USE 2 Fatal 1 -0.4506 0.3823 1.3896 0.2385
IMP_LAND_USE 2 TypeA 1 -0.1101 0.1645 0.4476 0.5035
IMP_LAND_USE 2 TypeB 1 -0.00905 0.1014 0.0080 0.9289
IMP_LAND_USE 2 TypeC 1 -0.1108 0.1295 0.7319 0.3923
IMP_LAND_USE 3 Fatal 1 -0.3227 0.2114 2.3298 0.1269
IMP_LAND_USE 3 TypeA 1 0.0502 0.1534 0.1069 0.7437
IMP_LAND_USE 3 TypeB 1 -0.0747 0.1021 0.5351 0.4645
IMP_LAND_USE 3 TypeC 1 0.1651 0.0758 4.7445 0.0294
IMP_LAND_USE 8 Fatal 1 0.2121 0.1803 1.3838 0.2395
IMP_LAND_USE 8 TypeA 1 0.2737 0.0810 11.4283 0.0007
IMP_LAND_USE 8 TypeB 1 0.2910 0.0938 9.6167 0.0019
IMP_LAND_USE 8 TypeC 1 0.0988 0.0710 1.9399 0.1637
IMP_driver_age   Fatal 1 0.0118 0.00278 18.0843 <.0001
IMP_driver_age   TypeA 1 0.00556 0.00148 14.1946 0.0002
IMP_driver_age   TypeB 1 0.00104 0.00128 0.6607 0.4163
IMP_driver_age   TypeC 1 0.00204 0.000815 6.2645 0.0123
overspeed 1 Fatal 1 0.6434 0.1508 18.2100 <.0001
overspeed 1 TypeA 1 0.6316 0.1057 35.7132 <.0001
overspeed 1 TypeB 1 0.5588 0.0801 48.6655 <.0001
overspeed 1 TypeC 1 0.3216 0.0775 17.2043 <.0001
IMP_ALCHL_IM 1 Fatal 1 2.3002 0.1526 227.2388 <.0001
IMP_ALCHL_IM 1 TypeA 1 1.6115 0.0857 353.6302 <.0001
IMP_ALCHL_IM 1 TypeB 1 1.0581 0.0795 177.1651 <.0001
IMP_ALCHL_IM 1 TypeC 1 0.3848 0.0998 14.8645 0.0001
Cross Validation Results:

V. Results

VII. Accident Price Explorer System

Accident Price Explorer System
Check to use Safty Rating; Uncheck to use vehicle make, model, year;