Différences

Ci-dessous, les différences entre deux révisions de la page.

--- python:first_course_statistics [2016/10/09 23:16]
Francesco Beretta [Eruptions of the Old Faithful geyser (p.5)]
+++ python:first_course_statistics [2016/10/26 19:34]
Beretta, Anna Letizia
@@ Ligne 3: / Ligne 3: @@
 Read following important documentation about:
-  * pandas: accessing dataframes (tables)
+  * pandas [[http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe|dataframes]]
   * [[http://matplotlib.org/api/pyplot_summary.html|matplotlib.pyplot]]
@@ Ligne 13: / Ligne 13: @@
 ===== Histogram (p.5) =====
-FB: this script works fine !
 <code python>
@@ Ligne 31: / Ligne 29: @@
 \\
+===== Boxplot (p. 6) =====
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+gysr1_boxplot = pd.read_csv('...\geyser1.TAB', '\t')
+data_gysr1 = gysr1_boxplot['Interval']
+plt.boxplot(data_gysr1)
+ax = plt.gca()
+ax.set_xlabel('222 cases')
+ax.set_ylabel('Interruption time ( minutes')
+ax.set_title('Box and Whisker Plot')
+plt.show()
+</code>
+\\
+===== ScatterPlot (p. 7) =====
+AB: Put face- and edgecolor to change both of them. You can also have two different colors for the in- and outside of each dot.
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+geysr1_scatterplot = pd.read_csv('...\geyser1.TAB', '\t')
+geysr1_data_Xax = geysr1_scatterplot['Duration']
+geysr1_data_Yax = geysr1_scatterplot['Interval']
+plt.scatter(geysr1_data_Xax, geysr1_data_Yax, facecolor='y', edgecolor='y')
+ax = plt.gca()
+ax.set_xlabel('Eruption duration time (minutes)')
+ax.set_ylabel('Interuption time (minutes)')
+ax.set_title('Scatter Plot of INTERVAL vs DURATION')
+plt.show()
+</code>
+\\
+===== Descriptive statistics (p.9) =====
+Note: try different examples, e.g. the whole population or only those where 'Duration' <= 3, the whole dataframe
+[[http://pandas.pydata.org/pandas-docs/stable/basics.html#descriptive-statistics|doc]] – [[http://www.marsja.se/pandas-python-descriptive-statistics/|example]]
+<code python>
+import pandas as pd
+gysr1 = pd.read_csv('../geyser1.tab', '\t')
+gysr1['Duration'][gysr1['Duration'] <= 3].describe()
+</code>
+\\
+===== Boxplot (p.9) =====
+Selecting rows in a dataframe: [[http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking|doc]] / [[http://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas|example]]
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+gysr1 = pd.read_csv('../geyser1.tab', '\t')
+gysr1_inf3 = gysr1.loc[gysr1['Duration'] <= 3]
+gysr1_sup3 = gysr1.loc[gysr1['Duration'] > 3]
+plt.boxplot([gysr1_inf3['Interval'],gysr1_sup3['Interval']], labels= ['inf3','sup3'])
+</code>
+\\
 ====== International adoption rates (p.13) ======
+===== Boxplot (p.14) =====
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+adopt_data = pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t')
+adopt1 = adopt_data['Visa91']
+plt.boxplot(adopt1)
+ax = plt.gca()
+ax.set_title('Box and Whisker Plot')
+ax.set_xlabel('39 cases')
+ax.set_ylabel('Number of visas in 1991')
+plt.show()
+</code>
+\\
+===== Histogram (p.14) =====
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+adopt_data = pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t')
+adopt1 = adopt_data['Visa91']
+plt.hist(adopt1)
+plt.show()
+</code>
+\\
+=====Histogram with Log(p.18)=====
+don't find the way to do it
+<code Python>
+import pandas as pd
+import matplotlib.pyplot as plt
+adopt = pd.DataFrame(pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t'))
+adopt_loghist = adopt['Visa91']
+#adopt_loghist.semilogx() --> was one of the possibilities
+ax = plt.gca()
+ax.hist(adopt_loghist, bins=10, plt.loglog(0.5,3.5), color='r') #put log=True instead, but you will get the log for the frequencies
+plt.gca().set_xscale("log")
+ax.set_xlabel('Log (Number of 1991 visas')
+ax.set_ylabel('Frequency')
+ax.set_title('Histogram')
+plt.show()
+</code>
+=====Scatterplot (p. 17)=====
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+adoption_scatterplot = pd.read_csv('...\adopt.TAB', '\t')
+adopt_data_Xax = adoption_scatterplot['Visa88']
+adopt_data_Yax = adoption_scatterplot['Visa91']
+plt.scatter(adopt_data_Xax, adopt_data_Yax, facecolor='y', edgecolor='y')
+ax = plt.gca()
+ax.set_xlabel('Number of Visas in 1988')
+ax.set_ylim([0,2700])
+ax.set_xlim([0,5000])
+ax.set_ylabel('Number of Visas in 1991')
+ax.set_title('ScatterPlot of Visa91 vs Visa88')
+plt.show()
+</code>
+\\
+=====Scatterplot (p.18)=====
+<code python>
+import matplotlib.pyplot as plt
+import pandas as pd
+adoption_scatterplot = pd.read_csv('D:\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB', '\t')
+adopt_data_Xax = adoption_scatterplot['Visa91']
+adopt_data_Yax = adoption_scatterplot['Visa92']
+plt.scatter(adopt_data_Xax, adopt_data_Yax, facecolor='y', edgecolor='y')
+ax = plt.gca()
+ax.set_xlabel('Number of Visas in 1991')
+ax.set_ylim([0,1800])
+ax.set_xlim([0,2700])
+ax.set_ylabel('Number of Visas in 1992')
+ax.set_title('ScatterPlot of Visa92 vs Visa91')
+plt.show()
+</code>

Wiki de l'ARHNAxe de recherche en histoire numériqueLARHRA UMR5190

Outils pour utilisateurs

Outils du site

Différences

Outils de la page

Wiki de l'ARHN

Axe de recherche en histoire numérique
LARHRA UMR5190