Outils pour utilisateurs

Outils du site


python:first_course_statistics

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
Prochaine révision Les deux révisions suivantes
python:first_course_statistics [2016/10/09 23:21]
Francesco Beretta [General instructions]
python:first_course_statistics [2016/10/19 14:57]
Beretta, Anna Letizia
Ligne 13: Ligne 13:
  
 ===== Histogram (p.5) ===== ===== Histogram (p.5) =====
- 
-FB: this script works fine ! 
  
 <code python> <code python>
Ligne 31: Ligne 29:
  
 \\ \\
 +
 +===== Boxplot (p. 6) =====
 +
 +<code python>
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +gysr1_boxplot = pd.read_csv('​...\geyser1.TAB',​ '​\t'​)
 +data_gysr1 = gysr1_boxplot['​Interval'​]
 +plt.boxplot(data_gysr1)
 +ax = plt.gca()
 +ax.set_xlabel('​222 cases'​)
 +ax.set_ylabel('​Interruption time ( minutes'​)
 +ax.set_title('​Box and Whisker Plot')
 +plt.show()
 +</​code>​
 +
 +
 +\\
 +
 +===== ScatterPlot (p. 7) =====
 +
 +AB: Put face- and edgecolor to change both of them. You can also have two different colors for the in- and outside of each dot.
 +
 +<code python>
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +geysr1_scatterplot = pd.read_csv('​...\geyser1.TAB',​ '​\t'​)
 +geysr1_data_Xax = geysr1_scatterplot['​Duration'​]
 +geysr1_data_Yax = geysr1_scatterplot['​Interval'​]
 +plt.scatter(geysr1_data_Xax,​ geysr1_data_Yax,​ facecolor='​y',​ edgecolor='​y'​)
 +ax = plt.gca()
 +ax.set_xlabel('​Eruption duration time (minutes)'​)
 +ax.set_ylabel('​Interuption time (minutes)'​)
 +ax.set_title('​Scatter Plot of INTERVAL vs DURATION'​)
 +plt.show()
 +</​code>​
 +
 +
 +\\
 +
 +
 +===== Descriptive statistics (p.9) =====
 +
 +Note: try different examples, e.g. the whole population or only those where '​Duration'​ <= 3, the whole dataframe
 +
 +[[http://​pandas.pydata.org/​pandas-docs/​stable/​basics.html#​descriptive-statistics|doc]] – [[http://​www.marsja.se/​pandas-python-descriptive-statistics/​|example]]
 +
 +<code python>
 +import pandas as pd
 +gysr1 = pd.read_csv('​../​geyser1.tab',​ '​\t'​)
 +gysr1['​Duration'​][gysr1['​Duration'​] <= 3].describe()
 +</​code>​
 +
 +
 +\\
 +
 +
 +===== Boxplot (p.9) =====
 +
 +Selecting rows in a dataframe: [[http://​pandas.pydata.org/​pandas-docs/​stable/​indexing.html#​the-where-method-and-masking|doc]] / [[http://​stackoverflow.com/​questions/​17071871/​select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas|example]]
 +
 +<code python>
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +gysr1 = pd.read_csv('​../​geyser1.tab',​ '​\t'​)
 +gysr1_inf3 = gysr1.loc[gysr1['​Duration'​] <= 3]
 +gysr1_sup3 = gysr1.loc[gysr1['​Duration'​] > 3]
 +plt.boxplot([gysr1_inf3['​Interval'​],​gysr1_sup3['​Interval'​]],​ labels= ['​inf3','​sup3'​])
 +</​code>​
 +
 +
 +\\
 +
  
 ====== International adoption rates (p.13) ====== ====== International adoption rates (p.13) ======
  
 +===== Boxplot (p.14) =====
 +
 +<code python>
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +adopt_data = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB',​ '​\t'​)
 +adopt1 = adopt_data['​Visa91'​]
 +plt.boxplot(adopt1)
 +ax = plt.gca()
 +ax.set_title('​Box and Whisker Plot')
 +ax.set_xlabel('​39 cases'​)
 +ax.set_ylabel('​Number of visas in 1991')
 +plt.show()
 +</​code>​
 +
 +
 +\\
 +
 +
 +===== Histogram (p.14) =====
 +
 +<code python>
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +adopt_data = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\adopt.TAB',​ '​\t'​)
 +adopt1 = adopt_data['​Visa91'​]
 +plt.hist(adopt1)
 +plt.show()
 +</​code>​
python/first_course_statistics.txt · Dernière modification: 2017/09/26 08:54 par Francesco Beretta