Outils pour utilisateurs

Outils du site


python:first_course_statistics

Ceci est une ancienne révision du document !


General instructions

Read following important documentation about:

Save your scripts in a folder inside the data folder, calling the script folder 'my_scripts' or whaterver. If 'my-scripts' is set as your current working directory, then the data files are available under this address '../[data file]', for instantce: '../geyser1.TAB'

Eruptions of the "Old Faithful" geyser (p.5)


Histogram (p.5)

import pandas as pd
import matplotlib.pyplot as plt
gys1 = pd.DataFrame(pd.read_csv('../geyser1.TAB', '\t'))
g_int = gys1['Interval']
ax = plt.gca()
ax.hist(g_int, bins=20, color='r')
ax.set_xlabel('Intereruption time')
ax.set_ylabel('Frequency')
ax.set_title('Histogram')
plt.show() 


Boxplot (p. 6)

import matplotlib.pyplot as plt
import pandas as pd
gysr1_boxplot = pd.read_csv('...\geyser1.TAB', '\t')
data_gysr1 = gysr1_boxplot['Interval']
plt.boxplot(data_gysr1)
ax = plt.gca()
ax.set_xlabel('222 cases')
ax.set_ylabel('Interruption time ( minutes')
ax.set_title('Box and Whisker Plot')
plt.show()


ScatterPlot (p. 7)

AB: Put face- and edgecolor to change both of them. You can also have two different colors for the in- and outside of each dot.

import matplotlib.pyplot as plt
import pandas as pd
geysr1_scatterplot = pd.read_csv('...\geyser1.TAB', '\t')
geysr1_data_Xax = geysr1_scatterplot['Duration']
geysr1_data_Yax = geysr1_scatterplot['Interval']
plt.scatter(geysr1_data_Xax, geysr1_data_Yax, facecolor='y', edgecolor='y')
ax = plt.gca()
ax.set_xlabel('Eruption duration time (minutes)')
ax.set_ylabel('Interuption time (minutes)')
ax.set_title('Scatter Plot of INTERVAL vs DURATION')
plt.show()


Descriptive statistics (p.9)

Note: try different examples, e.g. the whole population or only those where 'Duration' ⇐ 3, the whole dataframe

docexample

import pandas as pd
gysr1 = pd.read_csv('../geyser1.tab', '\t')
gysr1['Duration'][gysr1['Duration'] <= 3].describe()


Boxplot (p.9)

Selecting rows in a dataframe: doc / example

import matplotlib.pyplot as plt
import pandas as pd
gysr1 = pd.read_csv('../geyser1.tab', '\t')
gysr1_inf3 = gysr1.loc[gysr1['Duration'] <= 3]
gysr1_sup3 = gysr1.loc[gysr1['Duration'] > 3]
plt.boxplot([gysr1_inf3['Interval'],gysr1_sup3['Interval']], labels= ['inf3','sup3'])


International adoption rates (p.13)

Boxplot (p.14)

import matplotlib.pyplot as plt
import pandas as pd
adopt_data = pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t')
adopt1 = adopt_data['Visa91']
plt.boxplot(adopt1)
ax = plt.gca()
ax.set_title('Box and Whisker Plot')
ax.set_xlabel('39 cases')
ax.set_ylabel('Number of visas in 1991')
plt.show()


Histogram (p.14)

import matplotlib.pyplot as plt
import pandas as pd
adopt_data = pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t')
adopt1 = adopt_data['Visa91']
plt.hist(adopt1)
plt.show()


Scatterplot (p. 17)

import matplotlib.pyplot as plt
import pandas as pd
adoption_scatterplot = pd.read_csv('...\adopt.TAB', '\t')
adopt_data_Xax = adoption_scatterplot['Visa88']
adopt_data_Yax = adoption_scatterplot['Visa91']
plt.scatter(adopt_data_Xax, adopt_data_Yax, facecolor='y', edgecolor='y')
ax = plt.gca()
ax.set_xlabel('Number of Visas in 1988')
ax.set_ylim([0,2700])
ax.set_xlim([0,5000])
ax.set_ylabel('Number of Visas in 1991')
ax.set_title('ScatterPlot of Visa91 vs Visa88')
plt.show()


Scatterplot (p.18)

import matplotlib.pyplot as plt import pandas as pd adoption_scatterplot = pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t') adopt_data_Xax = adoption_scatterplot['Visa91'] adopt_data_Yax = adoption_scatterplot['Visa92'] plt.scatter(adopt_data_Xax, adopt_data_Yax, facecolor='y', edgecolor='y') ax = plt.gca() ax.set_xlabel('Number of Visas in 1991') ax.set_ylim([0,1800]) ax.set_xlim([0,2700]) ax.set_ylabel('Number of Visas in 1992') ax.set_title('ScatterPlot of Visa92 vs Visa91') plt.show() </code>

python/first_course_statistics.1477312198.txt.gz · Dernière modification: 2016/10/24 14:29 par Beretta, Anna Letizia