Outils pour utilisateurs

Outils du site


python:first_course_statistics

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
python:first_course_statistics [2016/10/28 14:36]
Beretta, Anna Letizia
python:first_course_statistics [2017/09/26 08:54] (Version actuelle)
Francesco Beretta [General instructions]
Ligne 5: Ligne 5:
   * pandas [[http://​pandas.pydata.org/​pandas-docs/​stable/​dsintro.html#​dataframe|dataframes]]   * pandas [[http://​pandas.pydata.org/​pandas-docs/​stable/​dsintro.html#​dataframe|dataframes]]
   * [[http://​matplotlib.org/​api/​pyplot_summary.html|matplotlib.pyplot]]   * [[http://​matplotlib.org/​api/​pyplot_summary.html|matplotlib.pyplot]]
 +
 +
 +Get the data from [[http://​people.stern.nyu.edu/​jsimonof/​Casebook/​Data/​ASCII/​README.html|this site]].
 +
  
 Save your scripts in a folder inside the data folder, calling the script folder '​my_scripts'​ or whaterver. If  '​my-scripts'​ is set as your [[python:​generic_features#​get_the_current_working_directory_address|current working directory]],​ then the data files are available under this address '​../​[data file]',​ for instantce: '​../​geyser1.TAB'​ Save your scripts in a folder inside the data folder, calling the script folder '​my_scripts'​ or whaterver. If  '​my-scripts'​ is set as your [[python:​generic_features#​get_the_current_working_directory_address|current working directory]],​ then the data files are available under this address '​../​[data file]',​ for instantce: '​../​geyser1.TAB'​
Ligne 35: Ligne 39:
 import matplotlib.pyplot as plt import matplotlib.pyplot as plt
 import pandas as pd import pandas as pd
-gysr1_boxplot = pd.read_csv('​...\geyser1.TAB',​ '​\t'​)+gysr1_boxplot = pd.read_csv('​.../geyser1.TAB',​ '​\t'​)
 data_gysr1 = gysr1_boxplot['​Interval'​] data_gysr1 = gysr1_boxplot['​Interval'​]
 plt.boxplot(data_gysr1) plt.boxplot(data_gysr1)
Ligne 55: Ligne 59:
 import matplotlib.pyplot as plt import matplotlib.pyplot as plt
 import pandas as pd import pandas as pd
-geysr1_scatterplot = pd.read_csv('​...\geyser1.TAB',​ '​\t'​)+geysr1_scatterplot = pd.read_csv('​.../geyser1.TAB',​ '​\t'​)
 geysr1_data_Xax = geysr1_scatterplot['​Duration'​] geysr1_data_Xax = geysr1_scatterplot['​Duration'​]
 geysr1_data_Yax = geysr1_scatterplot['​Interval'​] geysr1_data_Yax = geysr1_scatterplot['​Interval'​]
Ligne 242: Ligne 246:
 =====Scatter Plot of PRODJAPN vs QUALJAPN (p. 27) ===== =====Scatter Plot of PRODJAPN vs QUALJAPN (p. 27) =====
  
 +<code Python>
 +import pandas as pd
 +import matplotlib.pyplot as plt
 +scatter_plot = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\prdq.TAB',​ '​\t'​)
 +productivity_Y = scatter_plot['​ProdJapn'​]
 +quality_X = scatter_plot['​QualJapn'​]
 +plt.scatter(productivity_Y,​ quality_X, bins=20, colors='​r'​)
 +ax = plt.gca()
 +ax.set_Xlabel('​Assembly defects per 100 cars (Japanese origin)'​)
 +ax.set_Ylabel('​Hours per vehicle (Japanese origin'​)
 +ax.set_title('​Scatter Plot of PRODJAPN VS QUALJAPN'​)
 +plt.show()
 +</​code>​
 +
 +
 +=====Scatter Plot of PRODNONJ cs QUALNONJ (p. 27)=====
 +<code Python>
 +import pandas as pd
 +import matplotlib.pyplot as plt
 +scatter_plot = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\prdq.TAB',​ '​\t'​)
 +productivity_Y = scatter_plot['​ProdNonJ'​]
 +quality_X = scatter_plot['​QualNonJ'​]
 +plt.scatter(productivity_Y,​ quality_X, bins=20, colors='​r'​)
 +ax = plt.gca()
 +ax.set_Xlabel('​Assembly defects per 100 cars (non-Japanese origin)'​)
 +ax.set_Ylabel('​Hours per vehicle (non-Japanese origin'​)
 +ax.set_title('​Scatter Plot of PRODNONJ VS QUALNONJ'​)
 +plt.show()
 +</​code>​
 +
 +
 +
 +===== Scatterplot of productivity VS quality (p. 28) =====
 +<code python>
 +import pandas as pd
 +import matplotlib.pyplot as plt
 +scatter_plot = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\\prdq.TAB',​ '​\t'​)
 +productivity_Y = scatter_plot['​Producti'​]
 +quality_X = scatter_plot['​Quality'​]
 +plt.scatter(productivity_Y,​ quality_X, bins=20, colors='​r'​)
 +ax = plt.gca()
 +ax.set_Xlabel('​Assembly defects per 100 cars')
 +ax.set_Ylabel('​Hours per vehicle'​)
 +ax.set_title('​Scatter Plot of PRODUCTIVITY VS QUALITY'​)
 +plt.show()
 +</​code>​
 +
 +
 +===== Productivity versus quality in the assembly plant (p.29) =====
 +
 +It worked the first time but now it doesn'​t work again. Maybe again a windows error?
 +
 +<code python>
 +#1
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +data_comparison = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\prdq.TAB',​ '​\t'​)
 +non_japanese = data_comparison.loc[data_comparison['​QualNonJ'​]]
 +japanese = data_comparison.loc[data_comparison['​QualJapn'​]]
 +plt.boxplot([non_japanese['​Quality'​],​japanese['​Quality'​]],​ labels= ['​Non-japanese','​Japanese'​])
 +plt.show()
 +
 +#2
 +import matplotlib.pyplot as plt
 +import pandas as pd
 +data_comparison = pd.read_csv('​D:​\Python\Libri\A_Casebook_for_a_First_Course_in_Statistics_and_Data_Analysis_Datasets\Data\Tab\prdq.TAB',​ '​\t'​)
 +non_japanese = data_comparison.loc[data_comparison['​ProdNonJ'​]]
 +japanese = data_comparison.loc[data_comparison['​ProdJapn'​]]
 +plt.boxplot([non_japanese['​Producti'​],​japanese['​Producti'​]],​ labels= ['​Non-japanese','​Japanese'​])
 +plt.show()
 +</​code>​
python/first_course_statistics.1477658163.txt.gz · Dernière modification: 2016/10/28 14:36 par Beretta, Anna Letizia