Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente Prochaine révision Les deux révisions suivantes | ||
python:first_course_statistics [2016/10/08 13:10] Beretta, Anna Letizia |
python:first_course_statistics [2016/10/17 07:20] Francesco Beretta [Histogram (p.5)] |
||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
+ | ====== General instructions ====== | ||
+ | Read following important documentation about: | ||
+ | * pandas [[http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe|dataframes]] | ||
+ | * [[http://matplotlib.org/api/pyplot_summary.html|matplotlib.pyplot]] | ||
+ | Save your scripts in a folder inside the data folder, calling the script folder 'my_scripts' or whaterver. If 'my-scripts' is set as your [[python:generic_features#get_the_current_working_directory_address|current working directory]], then the data files are available under this address '../[data file]', for instantce: '../geyser1.TAB' | ||
+ | \\ | ||
====== Eruptions of the "Old Faithful" geyser (p.5) ====== | ====== Eruptions of the "Old Faithful" geyser (p.5) ====== | ||
+ | \\ | ||
===== Histogram (p.5) ===== | ===== Histogram (p.5) ===== | ||
<code python> | <code python> | ||
- | # fake code – to be deleted | + | import pandas as pd |
- | import csv | + | import matplotlib.pyplot as plt |
- | filename = 'ch02-data.csv' | + | gys1 = pd.DataFrame(pd.read_csv('../geyser1.TAB', '\t')) |
- | f = open(filename) | + | g_int = gys1['Interval'] |
- | data = [] | + | ax = plt.gca() |
- | reader = csv.reader(f) | + | ax.hist(g_int, bins=20, color='r') |
- | header = reader.next() | + | ax.set_xlabel('Intereruption time') |
- | data = [row for row in reader] | + | ax.set_ylabel('Frequency') |
- | for datarow in data: | + | ax.set_title('Histogram') |
- | print datarow | + | plt.show() |
</code> | </code> | ||
+ | |||
+ | \\ | ||
+ | |||
+ | ===== Boxplot (p. 6) ===== | ||
+ | |||
+ | <code python> | ||
+ | import matplotlib.pyplot as plt | ||
+ | import pandas as pd | ||
+ | gysr1_boxplot = pd.read_csv('...\geyser1.TAB', '\t') | ||
+ | data_gysr1 = gysr1_boxplot['Interval'] | ||
+ | plt.boxplot(data_gysr1) | ||
+ | ax = plt.gca() | ||
+ | ax.set_xlabel('222 cases') | ||
+ | ax.set_ylabel('Interruption time ( minutes') | ||
+ | ax.set_title('Box and Whisker Plot') | ||
+ | plt.show() | ||
+ | </code> | ||
+ | |||
+ | |||
+ | \\ | ||
+ | |||
+ | ===== ScatterPlot (p. 7) ===== | ||
+ | |||
+ | AB: Put face- and edgecolor to change both of them. You can also have two different colors for the in- and outside of each dot. | ||
+ | |||
+ | <code python> | ||
+ | import matplotlib.pyplot as plt | ||
+ | import pandas as pd | ||
+ | geysr1_scatterplot = pd.read_csv('...\geyser1.TAB', '\t') | ||
+ | geysr1_data_Xax = geysr1_scatterplot['Duration'] | ||
+ | geysr1_data_Yax = geysr1_scatterplot['Interval'] | ||
+ | plt.scatter(geysr1_data_Xax, geysr1_data_Yax, facecolor='y', edgecolor='y') | ||
+ | ax = plt.gca() | ||
+ | ax.set_xlabel('Eruption duration time (minutes)') | ||
+ | ax.set_ylabel('Interuption time (minutes)') | ||
+ | ax.set_title('Scatter Plot of INTERVAL vs DURATION') | ||
+ | plt.show() | ||
+ | </code> | ||
====== International adoption rates (p.13) ====== | ====== International adoption rates (p.13) ====== | ||