2021-10-26

データベースのデータ型とか

データベース

f:id:ts0818:20211026213230j:plain

wccftech.com

⇧ 何て言うか、Appleって「チップ」の分野では新参者だとは思ってたんだけど、

jp.techcrunch.com

⇧ 上記サイト様の歴史を見る限り、やはり、新参者という気がするので、Appleの「チップ」の性能がIntelに与えてる衝撃は相当なものになってる気がしますね。

いつもの如く、冒頭から関係ない話でしたが、「データベース」の「データ型」とかについて調べてみました。

レッツトライ～。

コンピューターは「2進数」で処理を行う、ただし量子コンピューターは除かれると思われる

みんな大好き、「2進数」ですが、何故にコンピューターで「2進数」が使われるようになったのか？

気になるよね～。

www.bbc.co.uk

Computers use binary - the digits 0 and 1 - to store data. A binary digit, or bit, is the smallest unit of data in computing. It is represented by a 0 or a 1. Binary numbers are made up of binary digits (bits), eg the binary number 1001.

The circuits in a computer's processor are made up of billions of transistors. A transistor is a tiny switch that is activated by the electronic signals it receives. The digits 1 and 0 used in binary reflect the on and off states of a transistor.

Computer programs are sets of instructions. Each instruction is translated into machine code - simple binary codes that activate the CPU. Programmers write computer code and this is converted by a translator into binary instructions that the processor can execute.

All software, music, documents, and any other information that is processed by a computer, is also stored using binary.

https://www.bbc.co.uk/bitesize/guides/zwsbwmn/revision/1

⇧ 上記サイト様によりますと、「コンピュータープロッセッサー」の回路が、数十億の「トランジスタ」で構成されており、「トランジスタ」の「ON」と「OFF」は受信した電気信号で切り替えて制御してますと。

つまり「ON」か「OFF」かの制御状態を表すのに、「2進数」であれば「1」と「0」で表現できるので理にかなっていたということらしい。

f:id:ts0818:20210912103459p:plain

https://www.bbc.co.uk/bitesize/guides/zwsbwmn/revision/1

逆に言うと、「10進数」とか「8進数」とか「16進数」とかで表現されたデータも、最終的には、「2進数」で処理するのが現在主流のコンピューターってことなんですかね。

ちなみに、「8進数」とか「16進数」が誕生した経緯なんかも気になるよね～、ってことで、同様の疑問を投げかけてる方がおられました。

stackoverflow.com

⇧ 上記サイト様が詳しいです。

ちなみに、ちなみに、「10進数」が誕生した経緯なんかは、

nazology.net

⇧ 上記サイト様が詳しいです。

「量子コンピューター」が、「2進数」の範疇を超えていそうな件については、

www.kyoto-su.ac.jp

　従来のコンピュータ（量子コンピュータに対して「古典コンピュータ」と呼びます）では、全ての情報は0と1の組み合わせで表現されます。これを「ビット（bit）」と呼びます。古典コンピュータの中では、このビット列を変換し異なる状態にすることを繰り返して計算を行っています。例えば、5に1を足して6にするという計算は、5の2進数表記「101」の一番右側の1を0に、次の0を1にするという変換によって「110」を得るという形で理解されます。この変換規則は、論理回路と呼ばれるもので自由に作ることができます。

https://www.kyoto-su.ac.jp/project/st/st14_03.html

　現在主流となっている量子コンピュータの原理はドイチ（David Deutsch、1953-）という物理学者によって考案されたものです。その量子コンピュータも、同じような「量子ビット」を操作する機械ですが、量子ビットは上記の古典ビットとは違い、0か1かだけではなく、その“中間的”な状態も取ることができます。中間というと0.5などの値を思い浮かべるかも知れませんが、決してそうではありません。0の状態と1の状態を、同時にとることができるのです。一見、常識に反しているように思えますが、それは量子コンピュータが量子力学の原理に基づいているためです。量子力学は、ミクロの世界での物質の振る舞いを明らかにする力学で、直観とは全く異なる世界像を受け入れることを私たちに強います※コラム参照。だからこそ、量子コンピュータは従来とは全く異なるコンピュータになるのです。

https://www.kyoto-su.ac.jp/project/st/st14_03.html

⇧ 上記サイト様が詳しいです。

「量子コンピューター」は「量子ビット」なる仕組みを導入しているらしく、これまでの「コンピューター」で導入されていた「ビット（bit）」とは異なったものであるらしいですと。

以降、普通の「コンピューター」の話で。

「bit」と「byte」

改めて、普通の「コンピューター」を制御するには、「2進数」の仕組みを導入していきたいと。

そこで、生み出されたのが、「bit」というものですと。

The bit is the most basic unit of information in computing and digital communications. The name is a contraction of binary digit.

https://en.wikipedia.org/wiki/Bit

The bit represents a logical state with one of two possible values. These values are most commonly represented as either "1" or "0", but other representations such as true/false, yes/no, +/−, or on/off are commonly used.

https://en.wikipedia.org/wiki/Bit

History

The encoding of data by discrete bits was used in the punched cards invented by Basile Bouchon and Jean-Baptiste Falcon (1732), developed by Joseph Marie Jacquard (1804), and later adopted by Semyon Korsakov, Charles Babbage, Hermann Hollerith, and early computer manufacturers like IBM.

https://en.wikipedia.org/wiki/Bit

Ralph Hartley suggested the use of a logarithmic measure of information in 1928. Claude E. Shannon first used the word "bit" in his seminal 1948 paper "A Mathematical Theory of Communication". He attributed its origin to John W. Tukey, who had written a Bell Labs memo on 9 January 1947 in which he contracted "binary information digit" to simply "bit". Vannevar Bush had written in 1936 of "bits of information" that could be stored on the punched cards used in the mechanical computers of that time. The first programmable computer, built by Konrad Zuse, used binary notation for numbers.

https://en.wikipedia.org/wiki/Bit

⇧「bit」という用語を初めて使ったのは、「情報理論の父」と呼ばれていた「クロード・シャノン」だったらしい。

「ジョン・テューキー」の「binary information digit（二進数情報桁）」を略したとシャノンは言っているらしいですが、「bit」としたところにセンスを感じますな。

で、「bit」は「0」か「1」かの2つの状態を表現できるということで、逆に言うと、どちらか1つの状態を選ばないといけないわけですと。（「0」か「1」かで表現するのが「2進数」）

なので、例えば、1bit は、「0」が1個、「1」が1個の、計2個ものから1個を選びだすので、数学の「組合せ（combination）」で考えると、

\begin{eqnarray}
{}_2 \mathrm{ C }_1
= \binom{ 2 }{ 1 }
= \frac{ 2! }{ 1! ( 2 - 1 )! }
= 2
\end{eqnarray}

2bitだと、

\begin{eqnarray}
{}_2 \mathrm{ C }_1 \times {}_2 \mathrm{ C }_1
= \binom{ 2 }{ 1 } \times \binom{ 2 }{ 1 }
= \frac{ 2! }{ 1! ( 2 - 1 )! } \times \frac{ 2! }{ 1! ( 2 - 1 )! }
= 2 \times 2 = 2^2
\end{eqnarray}

って感じになるので、nbitだと、

$2^n$

になりますと。

で、「byte」はと言うと、

バイト (英: byte) は、「複数ビット」を意味する、データ量あるいは情報量の単位である。

1980年頃から1バイトは8ビット (bit) であることが一般的であったが、正式に定義されたのは2008年発行のIEC_80000-13である。 8ビットは、256個の異なる値（たとえば整数であれば、符号無しで0から255、符号付きで−128から+127、など）を表すことができる。

バイト (情報) - Wikipedia

⇧ 元々は、「byte」が何「bit」になるのかは決まりが無かったみたいなのだけど、2008年に、

$1byte = 8bit$

に定義されたらしい。なので、「8bit」は、

$2^8 = 2^{4} \times 2^{4} = 16 \times 16 = 256$

通りの組み合わせを表せるということですかね。

電子媒体の容量などは、ビット単位で表されることもあるが（チップの場合には構造上の理由もある）一般利用者の便宜上の観点からバイト単位で表されることが多い。メモリ空間のアドレッシングをバイト単位とするのは、前述のSystem/360で確立されたデファクトスタンダードである。

バイト (情報) - Wikipedia

⇧ ってな感じで、「System/360」というIBMの開発したコンピューターが「8bit」を「1byte」としていたことが、始まりらしいですが、「byte」を使う理由は、一般利用者の便宜のためらしい...

データベースのデータ型とか

Wikipediaさんによりますと、

In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers (of varying sizes), floating-point numbers (which approximate real numbers), characters and Booleans.

https://en.wikipedia.org/wiki/Data_type

A data type constrains the values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored. A data type provides a set of values from which an expression (i.e. variable, function, etc.) may take its values.

https://en.wikipedia.org/wiki/Data_type

⇧ とあって、どうも、「データ型」の全量がハッキリしない...

そもそも「データベース」の「データ型」には触れられてないんかな？

と思ったら、「SQL」のWikipediaで説明がありました。

The SQL standard defines three kinds of data types:

predefined data types
constructed types
user-defined types.

Constructed types are one of ARRAY, MULTISET, REF(erence), or ROW. User-defined types are comparable to classes in object-oriented language with their own constructors, observers, mutators, methods, inheritance, overloading, overwriting, interfaces, and so on. Predefined data types are intrinsically supported by the implementation.

https://en.wikipedia.org/wiki/SQL

Predefined data types

Character types

Character (CHAR)
Character varying (VARCHAR)
Character large object (CLOB)

National character types

National character (NCHAR)
National character varying (NCHAR VARYING)
National character large object (NCLOB)

Binary types

Binary (BINARY)
Binary varying (VARBINARY)
Binary large object (BLOB)

Numeric types

Exact numeric types (NUMERIC, DECIMAL, SMALLINT, INTEGER, BIGINT)
Approximate numeric types (FLOAT, REAL, DOUBLE PRECISION)
Decimal floating-point type (DECFLOAT)

Datetime types (DATE, TIME, TIMESTAMP)
Interval type (INTERVAL)
Boolean
XML
JSON

https://en.wikipedia.org/wiki/SQL

⇧ 残念ながら、「文字コード」とかについての話は無いみたいですね。

「文字コード」などについては、

itskillmap.com

⇧ 上記サイト様が詳しいです。

一応、Wikipediaさんの説明が過不足ない説明であると信じるならば、「標準SQL」で「データ型」というのは定義されているということらしいのですが、「データベース」の実装がベンダーさんによって独自色が入り込んでしまってるのも影響してるのか、「データ型」とかも違ってきてるっぽい。

あと、「プログラミング言語」と「SQL」の世界が異なるので、お互いの「データ型」をマッピングしてあげないと駄目ですと。

Oracleさんが、Javaのだいぶ旧いバージョンと「SQL」の一般的な「データ型」の対応表を公開してくれています。

表 2 は、Java のデータ型と SQL の一般的なデータ型のデフォルトのマッピングです。データベースの中には、これらのデータ型の一部だけをサポートしているものもあります。さまざまなマッピングの詳細については、あとの項で説明します。

https://docs.oracle.com/javase/jp/1.3/guide/jdbc/spec/jdbc-spec.frame8.html

f:id:ts0818:20211026204603p:plain

https://docs.oracle.com/javase/jp/1.3/guide/jdbc/spec/jdbc-spec.frame8.html

⇧ 上記のように、プログラミング言語と「SQL」で「データ型」が異なるため、どの「データ型」が対応するのかを確認する必要があるということですね。

何て言うか、Java以外の言語の場合、このあたりの対応表とか公開されてたりするのかな？

www.instaclustr.com

⇧ 上記サイト様のように、完全にその道に詳しい有志者の善意に頼るしかない模様ですね、そして、上記サイト様はJavaとPostgreSQLの対応表を作ってくれてるようですね。

う～ん、この業界、何かと対応表を確認しにくいという問題がありますね...

で、肝心の、「コンピューター」では、全ての情報は「0」か「1」のどちらかで表せられる必要があり、「bit」で表現できる必要があるので、「データベース」の「データ」も最終的には、「bit」ないしは、「bit」を8個集めた「byte」とかで表現できなければならないのであって、その大きさは「データ型」によって変わってくるということですかね。

このあたりは、各「データベース」のドキュメントの「データ型」とかに載ってるはず、たぶん。

dev.mysql.com

⇧ MySQLは割かし、一覧っぽい感じでまとめてくれてるっぽい。

まぁ、毎度モヤモヤ感が半端ない...

今回はこのへんで。