差分

このページの2つのバージョン間の差分を表示します。

--- python:numpy_scipy_matplotlib [2019/07/29 09:46] – ともやん
+++ python:numpy_scipy_matplotlib [2020/04/16 03:38] (現在) – ともやん
@@ 行 1: / 行 1: @@
-<ifauth !@loggedinusers><html>
+<html>
-<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
+  <style>
-<script>
+    #result pre {
-     (adsbygoogle = window.adsbygoogle || []).push({
+      height: 200px;
-          google_ad_client: "ca-pub-0791334967460971",
+      overflow: scroll;
-          enable_page_level_ads: true
+      overflow-x: hidden;
-     });
+    }
-</script></html></ifauth>
+    #logo_numpy {
+      background-color: #a2bae8;
+      width: fit-content;
+      padding: 10px;
+    }
+    #logo_numpy p {
+      margin: 0;
+    }
+  </style>
+</html>
 ====== NumPy, SciPy, Matplotlib, CuPy, Pandas のインストール ======
-===== NumPy, SciPy, Matplotlib, CuPy のインストール =====
+====== NumPy, SciPy, Matplotlib, CuPy のインストール ======
-==== Linux ====
+===== Linux =====
 <code>
 $ pip3 install numpy scipy matplotlib cupy
 </code>
-==== Windows ====
+===== Windows =====
 <code>
 > pip install numpy scipy matplotlib cupy
@@ 行 22: / 行 31: @@
 ※CuPy をインストールするには [[windows:visualstudio|Visual Studio 2019]] と [[windows:windows10_install_cuda|CUDA Tootkit]] のインストールが必要である。\\
-==== NumPy ====
+====== NumPy ======
-{{:python:numpy_logo.svg?200|NumPy Logo}}\\
+<WRAP #logo_numpy>
-[[https://numpy.org/|NumPy — NumPy]]\\
+{{:python:numpy_logo.svg?200|NumPy Logo}}
+</WRAP>
+本家: [[https://numpy.org/|NumPy — NumPy]]\\
+ソースコード: [[https://github.com/numpy/numpy|GitHub - numpy/numpy: The fundamental package for scientific computing with Python.]]\\
+<code>
+$ git clone https://github.com/numpy/numpy.git
+</code>
+リファレンス: [[https://docs.scipy.org/doc/numpy/reference/|NumPy Reference]]\\
 \\
 **NumPy** は、プログラミング言語Pythonにおいて数値計算を効率的に行うための拡張モジュールである。効率的な数値計算を行うための型付きの多次元配列（例えばベクトルや行列などを表現できる）のサポートをPythonに加えるとともに、それらを操作するための大規模な高水準の数学関数ライブラリを提供する。\\
 [[https://ja.wikipedia.org/wiki/NumPy|NumPy - Wikipedia]] より\\
-==== SciPy ====
+====== SciPy ======
 [[https://www.scipy.org/|SciPy.org — SciPy.org]]\\
 \\
@@ 行 35: / 行 51: @@
 [[https://ja.wikipedia.org/wiki/SciPy|SciPy - Wikipedia]] より\\
-==== Matplotlib ====
+====== Matplotlib ======
 [[https://matplotlib.org/|Matplotlib: Python plotting — Matplotlib 3.1.1 documentation]]\\
 \\
@@ 行 41: / 行 57: @@
 [[https://ja.wikipedia.org/wiki/Matplotlib|matplotlib - Wikipedia]] より\\
-==== CuPy ====
+====== CuPy ======
 {{:python:cupy_logo.png?200|CuPy Logo}}\\
-[[https://cupy.chainer.org/|CuPy]]\\
+本家: [[https://cupy.chainer.org/|CuPy]]\\
+ソースコード: [[https://github.com/cupy/cupy|GitHub - cupy/cupy: NumPy-like API accelerated with CUDA]]\\
+<code>
+$ git clone https://github.com/cupy/cupy.git
+</code>
+リファレンス: [[https://docs-cupy.chainer.org/en/stable/reference/|Reference Manual — CuPy 6.2.0 documentation]]\\
 \\
 **CuPy** は CUDA 上の NumPy 互換の多次元配列の実装である。CuPy は cupy.ndarray、コアの多次元配列クラス、およびそれに関する多くの関数で構成されている。それは numpy.ndarray インターフェイスのサブセットをサポートする。\\
 [[https://docs-cupy.chainer.org/en/stable/overview.html|Overview — CuPy 6.2.0 documentation]] より\\
-===== pandas のインストール =====
+====== Pandas のインストール ======
+本家: [[https://pandas.pydata.org/|Python Data Analysis Library — pandas: Python Data Analysis Library]]\\
+\\
+**Pandas** は、プログラミング言語 Python において、データ解析を支援する機能を提供するライブラリである。特に、数表および時系列データを操作するためのデータ構造と演算を提供する。Pandas は BSD ライセンスのもとで提供されている。\\
-==== Linux ====
+===== Linux =====
 <code>
 $ pip3 install pandas
 </code>
-==== Windows ====
+===== Windows =====
 <code>
 > pip install pandas
 </code>
-<WRAP prewrap 100%>
+<WRAP prewrap 100% #result>
 <code>
 Collecting pandas
@@ 行 72: / 行 96: @@
 Installing collected packages: pytz, pandas
 Successfully installed pandas-0.25.0 pytz-2019.1
+</code>
+</WRAP>
+===== 使い方 =====
+==== 行によって列数の異なる CSV を DataFrame 化する方法 ====
+<file csv data.csv>
+col1	col2	col3	col4	col5
+	2	3	4
+	2	3
+	2	3	4	5
+</file>
+<code python>
+In [1]: import csv
+In [2]: reader = csv.reader('data.csv')
+In [3]: reader
+Out[3]: <_csv.reader at 0x22681e467b8>
+In [4]: with open('data.csv') as fp:
+    ...:     reader = csv.reader(fp, delimiter='\t')
+    ...:     data = [row for row in reader]
+    ...:
+In [5]: data
+Out[5]:
+[['col1', 'col2', 'col3', 'col4', 'col5'],
+ ['1', '2', '3', '4'],
+ ['1', '2', '3'],
+ ['1', '2', '3', '4', '5']]
+In [6]: header, values = data[0], data[1:]
+In [7]: import pandas as pd
+In [8]: df = pd.DataFrame(values, columns=header)
+In [9]: print(df)
+  col1 col2 col3  col4  col5
+    1    2    3     4  None
+    1    2    3  None  None
+    1    2    3     4     5
+</code>
+==== データフレームのグループ化(集計) ====
+<WRAP prewrap 100%>
+<code python>
+In [1]: import pandas as pd
+In [2]: df = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4'],
+   ...:                   data=[['aaa', 1, 2, 3],
+   ...:                         ['bbb', 1, 2, 3],
+   ...:                         ['ccc', 1, 2, 3],
+   ...:                         ['aaa', 2, 4, 6],
+   ...:                         ['bbb', 2, 4, 6],
+   ...:                         ['ccc', 2, 4, 6],
+   ...:                         ['aaa', 3, 6, 9],
+   ...:                         ['bbb', 3, 6, 9],
+   ...:                         ['ccc', 3, 6, 9]])
+In [3]: df
+Out[3]:
+  col1  col2  col3  col4
+  aaa     1     2     3
+  bbb     1     2     3
+  ccc     1     2     3
+  aaa     2     4     6
+  bbb     2     4     6
+  ccc     2     4     6
+  aaa     3     6     9
+  bbb     3     6     9
+  ccc     3     6     9
+# groupby のデフォルト動作はグループラベルが index になり集計列が単一の場合は Series が返る
+In [4]: grouped = df.groupby(df['col1'])['col2'].sum()
+In [5]: type(grouped)
+Out[5]: pandas.core.series.Series
+In [6]: grouped
+Out[6]:
+col1
+aaa    6
+bbb    6
+ccc    6
+Name: col2, dtype: int64
+In [7]: pd.DataFrame(grouped)
+Out[7]:
+      col2
+col1
+aaa      6
+bbb      6
+ccc      6
+# as_index=False にするとグループラベルが列になり DataFrame が返る
+In [8]: grouped = df.groupby(df['col1'], as_index=False)['col2'].sum()
+In [9]: type(grouped)
+Out[9]: pandas.core.frame.DataFrame
+In [10]: grouped
+Out[10]:
+  col1  col2
+  aaa     6
+  bbb     6
+  ccc     6
 </code>
 </WRAP>