RDBMSの大量データ削除、結局どうすれば良いのか

www.itmedia.co.jp

　米ニューヨーク市と都市交通局（DOT）は8月22日（現地時間）、同市で初めて自動運転車のテストを許可したと発表した。

Waymo、ニューヨークへ「全米一複雑な都市」で自動運転テスト開始 - ITmedia NEWS

最初の対象は米Google系列のWaymoで、同社はマンハッタンの112丁目以南やダウンタウン・ブルックリン、DUMBO地区などの一部エリアで、8台の車両を用いた試験走行を9月末まで行うことが可能となる。

Waymo、ニューヨークへ「全米一複雑な都市」で自動運転テスト開始 - ITmedia NEWS

　Waymoはすでにフェニックスやサンフランシスコ、ロサンゼルスなど複数都市で自律走行の実績を積んでおり、全米では延べ1000万回以上の自動運転による走行実績がある。ニューヨーク市でのテストは、同社にとってこれまでで最も人口密度が高く交通の複雑な都市環境での取り組みとなる。

Waymo、ニューヨークへ「全米一複雑な都市」で自動運転テスト開始 - ITmedia NEWS

⇧ 気になるのは、交通量と、「道路標識」や「障害物」などを「物体検出」して「認識」できるかよね...

アメリカの「道路標識」はというと、

概要

アメリカ合衆国の道路標識は、交通制御機器類統一の手引き（Manual on Uniform Traffic Control Devices; MUTCD）およびその付録である標準主要道路標識（Standard Highway Signs; SHS）によって標準化されており、日本と同様、ウィーン交通条約は批准していない。

アメリカ合衆国の道路標識 - Wikipedia

⇧ 独自路線らしいですな。

ちなみに、

www.kccs.co.jp

①

北大江丸准教授が研究する積雪・降雪環境でのAIによるノイズ除去技術と、KCCSが開発する自律走行技術を組み合わせ、積雪・降雪の悪環境を走行する無人自動配送ロボットを共同開発

国内初、雪上を走行する中速・中型自動配送ロボットを共同開発準公道で走行試験成功｜KCCS

⇧ 上記によると「悪天候下」の研究については、日本で進展があったらしいのだが、「ニューヨーク」の「検証」は時期が「9月」ということは、「悪天候下」は考慮されていないっぽい...

「濃霧」とかの「悪天候下」においては、「道路標識」や「障害物」の「認識」が困難そうな気がするのだが、「位置情報」と予め正確な「地図情報」とかを参照できるようになっていたりするのかな？

ただ、

gigazine.net

2022年9月、娘の9歳の誕生日を祝うパーティーの帰路に就いていた男性が、崩壊した橋から自動車で転落死する痛ましい事故が発生しました。この事故の原因は、Googleマップが崩れた橋の情報を約10年間にわたり更新せず放置していたことにあるとして、男性の妻がGoogleを起訴しました。

Googleマップの致命的エラーで壊れた橋から車が転落した死亡事故でGoogleが提訴される - GIGAZINE

⇧ 上記にある通り「地図情報」の更新が放置されていたりするからなぁ...

「自動運転」については、今しばらくは細かい条件の制約を考慮する必要があるということでしょうかね...

現状の「自動運転」の残「課題」がどれほど存在していて、どいう状況になっているのかは気になるとこですな...

MySQLの公式の情報が微妙なんだが...

MySQLの公式のブログで、

dev.mysql.com

MySQL Portfolio & Support Lifecycle

LTS Releases will follow the Oracle Lifetime Support Policy, which includes 5 years of premier and 3 years of extended support. Innovation releases will be supported until the next major & minor release.

https://dev.mysql.com/blog-archive/introducing-mysql-innovation-and-long-term-support-lts-versions/

⇧「LTS（Long Term Support）」についての資料があるのだが、横軸に具体的な期間の記載が無いため、状況が分かり辛過ぎる...

公式のドキュメントでも、

dev.mysql.com

https://dev.mysql.com/doc/refman/8.4/en/mysql-releases.html

⇧ 分かり辛さは変わらない...

「凡例」が付いた分だけ、公式のブログの内容よりは多少マシなぐらい...

何故か、英語版のWikipediaの方が情報が充実しているという...

MySQL (/ˌmaɪˌɛsˌkjuːˈɛl/)is an open-source relational database management system (RDBMS).

https://en.wikipedia.org/wiki/MySQL

Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language.

https://en.wikipedia.org/wiki/MySQL

MySQL is free and open-source software under the terms of the GNU General Public License, and is also available under a variety of proprietary licenses. MySQL was owned and sponsored by the Swedish company MySQL AB, which was bought by Sun Microsystems (now Oracle Corporation). In 2010, when Oracle acquired Sun, Widenius forked the open-source MySQL project to create MariaDB.

https://en.wikipedia.org/wiki/MySQL

History

https://en.wikipedia.org/wiki/MySQL

⇧ 上記によると、「LTS（Long Term Support）」については、「8.4 LTS」とありバージョン「8系」までしかリリースされていないようだ。

「サポート終了（EOL：End Of Life）」についてもまとめられていて、何故、公式のサイトで情報整理が為されないのか不思議で仕方ないのだが...

RDBMSの大量データ削除、結局どうすれば良いのか

とりあえず、「MySQL」で考えた場合、

dev.mysql.com

https://dev.mysql.com/doc/refman/8.4/en/delete-optimization.html

To delete all rows from a MyISAM table, TRUNCATE TABLE tbl_name is faster than DELETE FROM tbl_name. Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock. See Section 15.1.37, “TRUNCATE TABLE Statement”.

https://dev.mysql.com/doc/refman/8.4/en/delete-optimization.html

⇧ とあり、全てのレコードを削除しても構わないケースであれば、「DELETE」ではなく「TRUNCATE」の方がパフォーマンスが良いらしい。

「MySQL」をベースにしているらしい「TiDB」の公式のサイトのまとめによると、

www.pingcap.com

In summary, understanding the key differences between DROP TABLE and TRUNCATE TABLE is essential for effective database management. While DROP TABLE removes the entire table structure and its data, TRUNCATE TABLE only deletes the data, preserving the table schema. Choosing the right command depends on your specific needs—whether you need to completely remove a table or simply clear its contents. We encourage you to practice and experiment with both commands in your MySQL environment to gain confidence and ensure optimal performance in your database operations.

https://www.pingcap.com/article/mysql-drop-table-vs-truncate-table-key-differences/

⇧ 同じような説明で、情報が微妙...

ネットの情報を検索した感じでは、

stackoverflow.com

https://stackoverflow.com/questions/67643258/deleting-billion-records-in-a-range-vs-exact-id-lookup-mysql

⇧ 概ね「TRUNCATE」する方針が多い、と言うか、他に方法が無いのかしらね...

「テーブル」の構成にもよりけりとは思いますが、

qiita.com

MySQL(InnoDB)でトランザクションを張らずに大量のレコードを全行削除する場合、DELETEよりTRUNCATE TABLEの方が速いが、1億レコードの規模のテーブルでもサクッと削除できるのか検証。

Amazon RDS(MySQL)で1億レコードのテーブルをTRUNCATE TABLEしてみる #AWS - Qiita

⇧ 比較的、「カラム」の少ない「テーブル」については、「1億」レコード程度であれば、時間は気にならないぐらいの結果になるようだ。

ちなみに、「400億」レコードをメンテナンスする良い方法は？って質問が上がってましたが、

Scaling MySQL to 40 Billion Rows

I know very little about databases, but I inherited a system with a database (running MariaDB with InnoDB as the storage engine on a single server with 256 GB of RAM and 14 TB of storage on HDDs) that needs to have a table with about 40 billion rows. Currently, when the database reaches 3 billion rows insert performance falls off a cliff and the DB cannot keep up with our data feed. The previous dev team would just delete all of our production data at 3 billion rows without telling anyone. A few times I've been asked to generate reports for data that we don't have anymore, but we're supposed to.

https://www.reddit.com/r/mysql/comments/fuxjbi/scaling_mysql_to_40_billion_rows/

After 24 hours, the only reason any data would be queried is when management asks us to generate a report. This happens 2 or 3 times a month.

https://www.reddit.com/r/mysql/comments/fuxjbi/scaling_mysql_to_40_billion_rows/

What strategies for managing the data can anyone suggest to help?

My initial thought was that I need to keep all of the data within a 24 hour period in an "active" table, and keep older data in a partitioned "archive" table. This will allow me to increase the probability that the active data and index stays in the InnoDB buffer pool. After 24 hours I'd roll the active data into the archive table. However, I've read a few blogs, reddit posts and stack overflow questions where people say "for the love of god don't solve the problem like this, it creates more problems than it solves." Is this true?

https://www.reddit.com/r/mysql/comments/fuxjbi/scaling_mysql_to_40_billion_rows/

What can I do?

Are there storage solutions that are better suited for this? The data is relational in nature, but the current version of the data already uses partitions, so there are no foreign keys, just IDs that point to other tables.

Any help or ideas are greatly appreciated!

https://www.reddit.com/r/mysql/comments/fuxjbi/scaling_mysql_to_40_billion_rows/

⇧ 丸投げされてる感じが恐ろしい...

ちなみに、

shimx.hateblo.jp

古いバージョンのInnoDBだとTRUNCATE TABLEしてもデータファイルのサイズが減らない件 - shimxmemo

⇧ 上記サイト様で古いバージョンの「MySQL」を利用している場合だと罠があると。

とりあえず、公式のドキュメントでは、「DELETE」よりかは「TRUNCATE」の方が「パフォーマンス」は良いぐらいの情報しか無いので、「大量データ」の「削除」についての「ベストプラクティス」的なものは存在しないようだ...

「TRUNCATE」前の「データ」の退避とかの問題については、公式のドキュメントで言及されていないのだが、

www.lifull.blog

MySQLの不要データをテーブルローテーションでイージーに削除した - LIFULL Creators Blog

⇧ 上記サイト様によりますと、複数「テーブル」を「ローテーション」する方式が紹介されていたのだが、「アプリケーション」側の負担が大きくなりそう...

あとは、

yoku0825.blogspot.com

「ちまちま削除する」なので、トランザクションでAll or Nothingを保証したい場合は使えない。 id をプライマリーキー（ただしサロゲートキーかどうかは問わない）、 hoge, last_update が本来消し込みに使いたいカラムだとする。プライマリーキー（またはユニークキー）がないテーブルのことは考えない。

日々の覚書: MySQLから大量のレコードをちまちま削除するメモ

⇧ 上記サイト様にありますように、データ不整合を許容するのであれば、ある程度の「レコード」量毎に「DELETE」する方式があるようだ。

可能な限り「保守・運用」の負担は減らしたいという気持ちはあるのだが、どうしても「トレードオフ」を迫られるのは不可避であると...

「銀の弾丸」は存在しないということなのだが、「大量データ」の削除に対するアプローチについては、そろそろ、「デファクトスタンダード」的な方式が確立されても良い気がするのだが、未だ、これといったものが発見されていない感じなのかしらね...

毎度モヤモヤ感が半端ない…

今回はこのへんで。