Verticaで更新系のクエリがどのNodeで実行されてデータが登録されたのかを確認したいのだけど...

A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format".

Oracle defines it as a collection of tables with metadata. The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS):

A document describing a database or collection of databases
An integral component of a DBMS that is required to determine its structure
A piece of middleware that extends or supplants the native data dictionary of a DBMS

https://en.wikipedia.org/wiki/Data_dictionary

⇧ Oracleのデータ・ディクショナリは、テーブルとしてOracle Databaseが管理しているっぽいのだけど、ユーザーは直接アクセスできないので、ビューを用意しているってことになるんかね。

話が脱線しましたが、Verticaにもメタデータ的なものを取得するためのテーブルが用意されているようなので、

docs.vertica.com

⇧ 上記を組み合わせて、SQLを組んでみました。

とりあえず、

SELECT 
 vqr.node_name
 , vls.schema_name
 , vls.table_name
 , vls.accepted_row_count
 , vls.rejected_row_count
 , vqr.statement_id
 , vqr.request_type
 , vqr.request
 , vqr.request_label
 , vqr.search_path
 , vqr.success
 , vqr.error_count
 , vqr.start_timestamp
 , var.end_timestamp
 , vqr.request_duration
 , vqr.request_duration_ms
 , vqr.is_executing
FROM v_monitor.load_streams vls 
  INNER JOIN v_monitor.query_requests vqr 
    ON vls.session_id = vqr.session_id
    AND vls.transaction_id = vqr.transaction_id
    AND vls.statement_id = vqr.statement_id;

⇧ こんな感じのを組んだのだけど、テーブル毎に、

transaction_id
statement_id

が異なる場合は、アウトなんだけど、流石に、同じ値になっていると信じたい...

実際は、WHERE句を追加して、start_timestampで日付とかで絞った方が良さそうだとは思う。

ちなみに、

https://docs.vertica.com/24.1.x/en/sql-reference/system-tables/v-monitor-schema/load-streams/

⇧ ドキュメントで、「STATEMENT_ID」の説明で、

『Unique numeric ID for the currently-running statement. NULL indicates that no statement is currently being processed. The combination of TRANSACTION_ID, STATEMENT_ID uniquely identifies a statement within a session.』

とあるのだけど、

「SESSION_ID」の説明で、

『Identifier of the session for which Vertica captures load stream information. This identifier is unique within the cluster for the current session, but can be reused in a subsequent session.』

とあるのが、不安な気持ちにさせてくれるけど、『「TRANSACTION_ID」＋「STATEMENT_ID」』の組み合わせで、重複することが無いと信じましょうか...

そもそも、Verticaが「列指向データベース」ということで更新処理に向いて無さそうということなのか、更新系のクエリの結果を確認するという用途が想定されていないんかな？

レコード（登録するデータ）は行ですものな...

ベストプラクティスにも載っていないってことは、そういうことなんでしょうね...

何やら、

docs.vertica.com

⇧ 更新系で無いならば、V_CATALOG schemaの「LOG_QUERIES」を使えば良さそうね。

テーブルに登録されたレコード数とかの情報は取得できないけど、テーブル名とかはrequestカラムの情報で取得できそうだけど、そう言うことじゃないんですよね...

ちなみに、

docs.vertica.com

⇧ 上記で、VerticaのNodeの全量が把握できるっぽい。

何と言うか、「V_MONITOR schema」だけでもテーブルの数が130個と...

どのテーブルを組み合わせれば目的とする情報が取得できるのか調査するのがしんど過ぎる...

仕事とは言え、不毛な作業の連続で、燃え尽き症候群にもなりますわな...

2024年2月9日（金）追記：↓ ここから

何やら、updateのクエリが拾えなかった...

www.vertica.com

https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/MonitoringCopyLoadsMetrics.htm

⇧ 何故か、この説明が旧いバージョンのドキュメントにしか見当たらない...

docs.vertica.com

⇧ なるほど、Vertica特有の機能の情報しか拾ってくれないと...

www.vertica.com

https://www.vertica.com/kb/Data-Loading-in-Vertica-Using-COPY/Content/BestPractices/Data-Loading-in-Vertica-Using-COPY.htm

⇧ loadはCOPY statementのみ対応ってことなんですかね？

う～む、Vertica微妙...

2024年2月9日（金）追記：↑ ここまで

2024年2月10日（土）追記：↓ ここから

VerticaのJoinについてのドキュメントだと、

docs.vertica.com

⇧ とあるので、

stackoverflow.com

https://stackoverflow.com/questions/26923616/inner-join-left-right-outer-join

⇧ 上図のベン図で言うところの、『FULL OUTER JOIN』のAとBを合わせた全ての領域の情報を取得することが可能なようなので、「V_MONITOR schema」の「QUERY_REQUESTS」側で、COPY statement以外の更新系の情報を拾うようにするしかないってことですかね...

う～む...更新系のクエリの実行結果を取得する方法を『Best Practices for Monitoring Vertica』のページに載せて欲しい...

2024年2月10日（土）追記：↑ ここまで

2024年2月12日（月）追記：↓ ここから

ChatGPTに聞いてみたところ、以下のようなSQLを提案されました。

/opt/vertica/bin/vsql -U your_username -w your_password -c "SELECT DISTINCT qr.node_name AS initiator_node, qep.target_node_name AS executor_node, qr.request, qr.client_hostname, qr.schema_name FROM v_monitor.query_requests qr JOIN v_monitor.execution_engine_profiles qep ON qr.request_id = qep.request_id WHERE qr.request ILIKE '%INSERT INTO%' OR qr.request ILIKE '%UPDATE%' OR qr.request ILIKE '%COPY%';" -t -A > update_queries.txt

⇧ あとは、頑張って重複なしのテーブル名のリスト化すれば、更新系の処理が実施されたテーブルが分かると。

本当に、SQLが期待した情報を取得してくれるものなのかが分からんのだけど...

2024年2月12日（月）追記：↑ ここまで

毎度モヤモヤ感が半端ない...

今回はこのへんで。