Différences

Ci-dessous, les différences entre deux révisions de la page.

--- python:python_www [2019/11/14 17:24]
Francesco Beretta [Technologies HTML et complémentaires]
+++ python:python_www [2019/11/17 12:13]
Francesco Beretta [Récupérer des pages HTML et les transformer en XML]
@@ Ligne 12: / Ligne 12: @@
-[[http://adrien.barbaresi.eu/blog/parsing-converting-lxml-html-tei.html|Parsing and converting HTML documents to XML/TEI format using Python’s lxml]]
+  * LXML
+    * [[http://adrien.barbaresi.eu/blog/parsing-converting-lxml-html-tei.html|Parsing and converting HTML documents to XML/TEI format using Python’s lxml]]
+    * [[https://pythontips.com/2018/06/20/an-intro-to-web-scraping-with-lxml-and-python/|Tutoriel avec exemple]]
+      * [[https://www.youtube.com/watch?v=5N066ISH8og|Vidéo du même tutoriel]]
+  * BeautifulSoup
+    * [[https://programminghistorian.org/en/lessons/intro-to-beautiful-soup|Programming historian: Intro to Beautiful Soup]]
+  * Trafilatura
+    * Une nouvelle librairie en cours de développement, utile et clés en main, parfois un peu limitée dans les possitilités de choix (en fonction de la compléxité de la page HTML)
+    * [[https://github.com/adbar/trafilatura|Trafilatura sur GitHub]]
+    * [[http://adrien.barbaresi.eu/blog/trafilatura-main-text-content-python.html|Extracting the main text content from web pages using Python]]
+  * [[https://scrapy.org/|Scrapy]]
+  * YouTube: [[https://www.youtube.com/watch?v=ve_0h4Y8nuI&list=PLhTjy8cBISEqkN-5Ku_kXG4QW33sxQo0t|Tutoriel complet]]

Wiki de l'ARHNAxe de recherche en histoire numériqueLARHRA UMR5190

Outils pour utilisateurs

Outils du site

Différences

Outils de la page

Wiki de l'ARHN

Axe de recherche en histoire numérique
LARHRA UMR5190