今天爱分享给大家带来Elasticsearch Python查询超过10000笔数据【解决方法】,希望能够帮助到大家。
Elasticsearch Python查询超过10000笔数据解决方法
起因
最近在做数据收集以及分析,目前收集的数据使用的是ES目前已经超过10W笔,当我想要将所以有数据从ES抓下来做分析的时候遇到了问题我使用form size 来做分页一开始查询第0至10000笔数据都是正常的但是当我想查询10000 至20000 笔数据就报错了查询代码如下
GET index/_search
{
"from ":10000,
"size" : 10000,
"query":{
"match_all":{}
}
}
报错如下
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "new_channel",
"node" : "dLHMyyNfQVuY-RSE1tPguQ",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
}
},
"status" : 400
}
如何处理
如上面的错误代码 说明的 ES 有保护机制最多就可以搜寻 10000笔数据 在多就要使用分页或是去更改设定档案来解决
这里介绍两种解决方法
1.直接更改 这个index 的设定 这样的好处在于快速 方便 但前提是 需要有权限 且这个方法可能会造成ES的效能变低
代码如下
PUT index/_setting
{
"index":{
"max_result_window":10000000
}
}
2. 使用ES的search_after
首先 第一次查询代码如下
GET index/_search
{
"size": 10000,
"query": {
"match_all": {}
},
"sort": [
{
"_id.keyword": "desc", # 使用 数据的 独立ID来排序
"upload_date": "asc" # 使用时间来排序
}
]
}
查询后得到结果如下
{
"platform" : "ig",
"channel_id" : "fffe365d-0e9f-4128-a53f-74fc47490edc",
"post_id" : "2436337532322256681",
"main_id" : "28ea9b5c-ceeb-485f-984e-c815b9e957d2",
"upload_date" : "2020-11-06T09:14:48",
"post_title" : "",
"description" : "",
"categories" : [ ],
"tag_person" : "",
"tags" : [ ],
"view_count" : 0,
"like_count" : 1634,
"dislike_count" : 0,
"average_rating" : 0,
"shortcode" : "CHPnBPPBRcp",
"comment_counts" : 13,
"last_update" : "2021-03-19T03:17:14.820854",
"follow" : 42804
},
"sort" : [
"fffe365d-0e9f-4128-a53f-74fc47490edc",
1604654088000
]
}
这里只需将最后一个sort 带入 search_aftet 就可以 完成分页查询
完成python 代码如下
data = es.search(index='', body={
"size": 10000,
"query": {
"match_all": {}
},
"sort": [
{
"channel_id.keyword": "desc",
"created_at": "asc"
}
]
})
while True:
for i in data['hits']['hits']:
if i['_id'] in a:
continue
a.append(i['_id'])
if data['hits']['hits']:
after = data['hits']['hits'][-1]['sort']
else:
print('查完')
break
print(len(a))
data = es.search(index='', body={
"size": 10000,
"query": {
"match_all": {}
},
"search_after": after,
"sort": [
{
"channel_id.keyword": "desc",
"created_at": "asc"
}
]
})
原文链接:https://blog.itblood.com/6887.html,转载请注明出处。

![任性旅行者之家v0.19 AI版[PC+安卓/10.7G/更新]Home for Wayward Travellers [v0.19 Public] [亚洲SLG/汉化/沙盒]](/wp-content/uploads/replace/2025/09/01/20725f1b357af6ad6128df20c5072bdc.webp)