--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- name: log: D:\Stata\week1.txt log type: text opened on: 17 Sep 2019, 16:47:50 . // Let's also create a command log: . cmdlog using cmdweek1.txt,replace (cmdlog D:\Stata\cmdweek1.txt opened) . // We will learn what it is later. . ******************************* . *** View Data . ******************************* . *** Open a Dataset . sysuse auto,clear // clear existing dataset in memory (1978 Automobile Data) . // Check what the variables and properties panels look like . *** Think of a few ways how you would like to explore the data: . *** Most directly way is to simply take a look of the data: . browse . // Structure of the data . // Explain color, missing, categorial data . // Note that you can only read the data without editting it. . // In order to edit the data, type "edit" instead: . edit . // I strongly discourage you from editting data this way. . *** To know more details about the data (structure) . describe Contains data from F:\Program Files\Stata16\ado\base/a/auto.dta obs: 74 1978 Automobile Data vars: 12 13 Apr 2018 17:45 (_dta has notes) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair Record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn Circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear Ratio foreign byte %8.0g origin Car type --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sorted by: foreign . // clear more . // elements displayed . *** Summary statistics . summarize Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- make | 0 price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 rep78 | 69 3.405797 .9899323 1 5 headroom | 74 2.993243 .8459948 1.5 5 -------------+--------------------------------------------------------- trunk | 74 13.75676 4.277404 5 23 weight | 74 3019.459 777.1936 1760 4840 length | 74 187.9324 22.26634 142 233 turn | 74 39.64865 4.399354 31 51 displacement | 74 197.2973 91.83722 79 425 -------------+--------------------------------------------------------- gear_ratio | 74 3.014865 .4562871 2.19 3.89 foreign | 74 .2972973 .4601885 0 1 . // why "make" has no observations . // rep78 have less obs, because of missing values . *** To learn the contents of the data in detail . codebook make // unite values --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- make Make and Model --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- type: string (str18), but longest is str17 unique values: 74 missing "": 0/74 examples: "Cad. Deville" "Dodge Magnum" "Merc. XR-7" "Pont. Catalina" warning: variable has embedded blanks . codebook foreign // value labels --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- foreign Car type --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- type: numeric (byte) label: origin range: [0,1] units: 1 unique values: 2 missing .: 0/74 tabulation: Freq. Numeric Label 52 0 Domestic 22 1 Foreign . codebook rep78 // missings --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- rep78 Repair Record 1978 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- type: numeric (int) range: [1,5] units: 1 unique values: 5 missing .: 5/74 tabulation: Freq. Value 2 1 8 2 30 3 18 4 11 5 5 . . *** To learn more about the missing data . browse if missing(rep78) . list make if missing(rep78) +---------------+ | make | |---------------| 3. | AMC Spirit | 7. | Buick Opel | 45. | Plym. Sapporo | 51. | Pont. Phoenix | 64. | Peugeot 604 | +---------------+ . *** More summary statistics, e.g., median . summarize,detail // notice that mean is very small Make and Model ------------------------------------------------------------- no observations Price ------------------------------------------------------------- Percentiles Smallest 1% 3291 3291 5% 3748 3299 10% 3895 3667 Obs 74 25% 4195 3748 Sum of Wgt. 74 50% 5006.5 Mean 6165.257 Largest Std. Dev. 2949.496 75% 6342 13466 90% 11385 13594 Variance 8699526 95% 13466 14500 Skewness 1.653434 99% 15906 15906 Kurtosis 4.819188 Mileage (mpg) ------------------------------------------------------------- Percentiles Smallest 1% 12 12 5% 14 12 10% 14 14 Obs 74 25% 18 14 Sum of Wgt. 74 50% 20 Mean 21.2973 Largest Std. Dev. 5.785503 75% 25 34 90% 29 35 Variance 33.47205 95% 34 35 Skewness .9487176 99% 41 41 Kurtosis 3.975005 Repair Record 1978 ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 2 1 10% 2 2 Obs 69 25% 3 2 Sum of Wgt. 69 50% 3 Mean 3.405797 Largest Std. Dev. .9899323 75% 4 5 90% 5 5 Variance .9799659 95% 5 5 Skewness -.0570331 99% 5 5 Kurtosis 2.678086 Headroom (in.) ------------------------------------------------------------- Percentiles Smallest 1% 1.5 1.5 5% 1.5 1.5 10% 2 1.5 Obs 74 25% 2.5 1.5 Sum of Wgt. 74 50% 3 Mean 2.993243 Largest Std. Dev. .8459948 75% 3.5 4.5 90% 4 4.5 Variance .7157071 95% 4.5 4.5 Skewness .1408651 99% 5 5 Kurtosis 2.208453 Trunk space (cu. ft.) ------------------------------------------------------------- Percentiles Smallest 1% 5 5 5% 7 6 10% 8 7 Obs 74 25% 10 7 Sum of Wgt. 74 50% 14 Mean 13.75676 Largest Std. Dev. 4.277404 75% 17 21 90% 20 21 Variance 18.29619 95% 21 22 Skewness .0292034 99% 23 23 Kurtosis 2.192052 Weight (lbs.) ------------------------------------------------------------- Percentiles Smallest 1% 1760 1760 5% 1830 1800 10% 2020 1800 Obs 74 25% 2240 1830 Sum of Wgt. 74 50% 3190 Mean 3019.459 Largest Std. Dev. 777.1936 75% 3600 4290 90% 4060 4330 Variance 604029.8 95% 4290 4720 Skewness .1481164 99% 4840 4840 Kurtosis 2.118403 Length (in.) ------------------------------------------------------------- Percentiles Smallest 1% 142 142 5% 154 147 10% 157 149 Obs 74 25% 170 154 Sum of Wgt. 74 50% 192.5 Mean 187.9324 Largest Std. Dev. 22.26634 75% 204 221 90% 218 222 Variance 495.7899 95% 221 230 Skewness -.0409746 99% 233 233 Kurtosis 2.04156 Turn Circle (ft.) ------------------------------------------------------------- Percentiles Smallest 1% 31 31 5% 33 32 10% 34 33 Obs 74 25% 36 33 Sum of Wgt. 74 50% 40 Mean 39.64865 Largest Std. Dev. 4.399354 75% 43 46 90% 45 48 Variance 19.35431 95% 46 48 Skewness .1238259 99% 51 51 Kurtosis 2.229458 Displacement (cu. in.) ------------------------------------------------------------- Percentiles Smallest 1% 79 79 5% 86 85 10% 97 86 Obs 74 25% 119 86 Sum of Wgt. 74 50% 196 Mean 197.2973 Largest Std. Dev. 91.83722 75% 250 350 90% 350 400 Variance 8434.075 95% 350 400 Skewness .5916565 99% 425 425 Kurtosis 2.375577 Gear Ratio ------------------------------------------------------------- Percentiles Smallest 1% 2.19 2.19 5% 2.28 2.24 10% 2.43 2.26 Obs 74 25% 2.73 2.28 Sum of Wgt. 74 50% 2.955 Mean 3.014865 Largest Std. Dev. .4562871 75% 3.37 3.78 90% 3.72 3.78 Variance .2081979 95% 3.78 3.81 Skewness .2191658 99% 3.89 3.89 Kurtosis 2.101812 Car type ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 74 25% 0 0 Sum of Wgt. 74 50% 0 Mean .2972973 Largest Std. Dev. .4601885 75% 1 1 90% 1 1 Variance .2117734 95% 1 1 Skewness .8869686 99% 1 1 Kurtosis 1.786713 . browse if price>13000 . *** Learn the composition of foreign/domestic and repairing records . tabulate foreign Car type | Freq. Percent Cum. ------------+----------------------------------- Domestic | 52 70.27 70.27 Foreign | 22 29.73 100.00 ------------+----------------------------------- Total | 74 100.00 . tabulate rep78 Repair | Record 1978 | Freq. Percent Cum. ------------+----------------------------------- 1 | 2 2.90 2.90 2 | 8 11.59 14.49 3 | 30 43.48 57.97 4 | 18 26.09 84.06 5 | 11 15.94 100.00 ------------+----------------------------------- Total | 69 100.00 . tabulate foreign rep78,row +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Repair Record 1978 Car type | 1 2 3 4 5 | Total -----------+-------------------------------------------------------+---------- Domestic | 2 8 27 9 2 | 48 | 4.17 16.67 56.25 18.75 4.17 | 100.00 -----------+-------------------------------------------------------+---------- Foreign | 0 0 3 9 9 | 21 | 0.00 0.00 14.29 42.86 42.86 | 100.00 -----------+-------------------------------------------------------+---------- Total | 2 8 30 18 11 | 69 | 2.90 11.59 43.48 26.09 15.94 | 100.00 . *** To compare gas mileages between foreign and domestic cars: . *** Think about the potential ways to do this . *** Method 1 . summarize mpg if foreign==0 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- mpg | 52 19.82692 4.743297 12 34 . summarize mpg if foreign==1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- mpg | 22 24.77273 6.611187 14 41 . // equal sign . *** Method 2 . by foreign,sort: summarize mpg --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -> foreign = Domestic Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- mpg | 52 19.82692 4.743297 12 34 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -> foreign = Foreign Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- mpg | 22 24.77273 6.611187 14 41 . *** Method 3 . tabulate foreign, summarize(mpg) | Summary of Mileage (mpg) Car type | Mean Std. Dev. Freq. ------------+------------------------------------ Domestic | 19.826923 4.7432972 52 Foreign | 24.772727 6.6111869 22 ------------+------------------------------------ Total | 21.297297 5.7855032 74 . . *** Hypothesis testing . ttest mpg,by(foreign) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- Domestic | 52 19.82692 .657777 4.743297 18.50638 21.14747 Foreign | 22 24.77273 1.40951 6.611187 21.84149 27.70396 ---------+-------------------------------------------------------------------- combined | 74 21.2973 .6725511 5.785503 19.9569 22.63769 ---------+-------------------------------------------------------------------- diff | -4.945804 1.362162 -7.661225 -2.230384 ------------------------------------------------------------------------------ diff = mean(Domestic) - mean(Foreign) t = -3.6308 Ho: diff = 0 degrees of freedom = 72 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0003 Pr(|T| > |t|) = 0.0005 Pr(T > t) = 0.9997 . . *** Relationships between mpg and weight . correlate mpg weight (obs=74) | mpg weight -------------+------------------ mpg | 1.0000 weight | -0.8072 1.0000 . by foreign,sort: correlate mpg weight --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -> foreign = Domestic (obs=52) | mpg weight -------------+------------------ mpg | 1.0000 weight | -0.8759 1.0000 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -> foreign = Foreign (obs=22) | mpg weight -------------+------------------ mpg | 1.0000 weight | -0.6829 1.0000 . correlate mpg weight length turn displacement (obs=74) | mpg weight length turn displa~t -------------+--------------------------------------------- mpg | 1.0000 weight | -0.8072 1.0000 length | -0.7958 0.9460 1.0000 turn | -0.7192 0.8574 0.8643 1.0000 displacement | -0.7056 0.8949 0.8351 0.7768 1.0000 . . *** Graph data . twoway (scatter mpg weight) . twoway (scatter mpg weight),by(foreign, total) . . *** Model fitting . generate wtsq=weight^2 . regress mpg weight wtsq foreign Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(3, 70) = 52.25 Model | 1689.15372 3 563.05124 Prob > F = 0.0000 Residual | 754.30574 70 10.7757963 R-squared = 0.6913 -------------+---------------------------------- Adj R-squared = 0.6781 Total | 2443.45946 73 33.4720474 Root MSE = 3.2827 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567 wtsq | 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06 foreign | -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002 _cons | 56.53884 6.197383 9.12 0.000 44.17855 68.89913 ------------------------------------------------------------------------------ . predict mpghat (option xb assumed; fitted values) . twoway (scatter mpg weight) (line mpghat weight,sort),by(foreign) . generate gp100m=100/mpg . label variable gp100m "Gallons per 100 miles" . twoway (scatter gp100m weight),by(foreign,total) . regress gp100m weight foreign Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 113.97 Model | 91.1761694 2 45.5880847 Prob > F = 0.0000 Residual | 28.4000913 71 .400001287 R-squared = 0.7625 -------------+---------------------------------- Adj R-squared = 0.7558 Total | 119.576261 73 1.63803097 Root MSE = .63246 ------------------------------------------------------------------------------ gp100m | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | .0016254 .0001183 13.74 0.000 .0013896 .0018612 foreign | .6220535 .1997381 3.11 0.003 .2237871 1.02032 _cons | -.0734839 .4019932 -0.18 0.855 -.8750354 .7280677 ------------------------------------------------------------------------------ . . *** Close and save the log files before you go . log close _all name: log: D:\Stata\week1.txt log type: text closed on: 17 Sep 2019, 16:47:58 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------