我一直在努力從HiveOperator任務中運行Hive查詢。Hive和Airflow都安裝在docker容器中,我可以從Airflow容器的python代碼中查詢Hive表,也可以通過Hive CLI成功查詢。但當我運行Airflow DAG時,我看到一個錯誤,說明沒有找到hive/beeline檔案。
DAG:
dag_hive = DAG(dag_id = "hive_script"/span>,
schedule_interval = '* * * *'。
start_date = airflow.utils.dates.days_ago(1)
hql_query = ""
CREATE TABLE IF NOT EXISTS mydb.test_af(
`test` int)。)
insert into mydb.test_af values (1);
"""
hive_task = HiveOperator(hql = hql_query,
task_id = "hive_script_task"。
hive_cli_conn_id = "hive_local"。
dag = dag_hive
)
hive_task
if __name__ == '__main__ ' :
dag_hive.cli()
日志:
Traceback (most recent call last):
檔案 "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
檔案 "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", 行 1331, in _prepare_and_execute_task_with_callback
result = self._execute_task(context, task_copy)
檔案 "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
result = task_copy.execute(context=context)
檔案 "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/apache/hive/operators/hive.py",行156,in execute
self.hook.run_cli(hql=self.hql, schema=self.schema, hive_conf=self.hiveconfs)
檔案 "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/apache/hive/hooks/hive.py", 行 249, in run_cli
hive_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=tmp_dir, close_fds=True
檔案 "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
檔案 "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError。[Errno 2] No such file or directory: 'beeline'。'beeline'。
[2021-08-19 12:22:04,291] {taskinstance.py: 1551}。INFO - 標記任務為失敗。 dag_id=***_script, task_id=***_script_task, execution_date=20210819T122100, start_date=20210819T122204, end_date=20210819T122204
[2021-08-19 12:22:04,323] {local_task_job.py:149] INFO - 任務已退出 with return code 1
如果有人幫助我,那就太好了。 提前感謝......
uj5u.com熱心網友回復:
你需要在Apache Airflow鏡像中安裝beeline。這取決于你使用的是什么Airflow鏡像,但Airflow的 "參考 "鏡像只包含最常見的提供者,蜂巢不在其中。你應該擴展或定制影像,以便在你的Airflow影像的路徑中添加beeline。
你可以在https://airflow.apache.org/docs/docker-stack/build.html#adding-new-apt-package閱讀更多關于擴展/定制Airflow影像的資訊。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/318976.html
標籤:
