Skip to content

What is query clustering?

In order to get targeted traffic, it is necessary to think over the structure so that users come to the pages they need, where they would find answers to their questions.

The most important stage of work to achieve this goal is the clustering of search queries.

What is query clustering for?

You have collected keywords to promote the site. What to do next? You need to distribute these phrases across the pages of the site (if the site already exists), or plan the creation of pages for the collected queries.

It doesn’t make sense to create a separate page for each request. Usually there are a lot of phrases in the semantic core that are similar in meaning, so the keywords need to be grouped. This is clustering.

The volume of the semantic core depends on the scale of the project. A large volume of key phrases needs to be sorted and regrouped — so large blocks of topics that interest users will appear. Large blocks are divided into smaller ones, and as a result of clustering, instead of an infinite list of phrases, we get a query tree.

The purpose of clustering is to bring all users who are looking for the same type of object to the page where this object is presented — that is, to determine which queries can be promoted on the same page.

For example: we have 2 requests in the group

  • black plaid in the living room interior
  • buy a black plaid

It is important to understand that we will not be able to promote such requests on 1 page. They will be in different output results, because the first request is more informative, there are more information documents here, and the second refers to commercial ones. In fact, it is almost impossible to promote both requests on the same page, because of this they are scattered in clusters, and then corresponding pages are created for each of them.

Просматривается выдача для каждого из запросов. Обычно достаточно посмотреть первую страницу, для того чтобы понять есть ли пересечения URL в двух вариантах поисковой выдачи. В случае если совпадают хотя бы 4−6, то можно сгруппировать эти запросы.

Однако есть другие коммерческие запросы, более сложные, где нужно провести тщательную кластеризацию и проанализировать выдачу поисковой системы.

In general, clustering is necessary for:

  • competent promotion of all pages for the necessary search queries;
  • effective compilation of technical specifications for performers (copywriters, rewriters);
  • cost reduction, because proper clustering and, as a result, high-quality content will allow you to bring pages for some queries to the TOP of the output without additional costs, including the purchase of links and much more.

What are the clustering methods?

Grouping by clusters can be done manually (if there are few requests) or automatically. In the latter case, a person will still have to participate, since as a result of automatic clustering, some of the requests will remain unsorted

Clustering algorithms

There are several algorithms for clustering key queries:

Clustering by top.

To determine the semantic and semantic affiliation of phrases, programs use search engine algorithms, analyzing the composition of the top search results. The program sends requests, for example, “food for elderly dogs” and “food for adult dogs”, and receives two fundamentally different issues in response.

Conclusion: these keys belong to different clusters. At the same time, the output for the queries “adult dog food” and “elderly dog food” is likely to be the same, that is, these keys belong to the same cluster. This is how each request is checked.

Clustering by word form.

Grouping on the basis of a word form involves assigning phrases to one group if the words included in them have the same roots. For example, the queries “norm of leukocytes in the blood of men” and “normal leukocytes in the blood of men” belong to the same cluster, since they contain only single-root words.

Clustering is a question/not a question.

The expediency of dividing queries into interrogative and narrative ones exists only if there are really a lot of interrogative keys and they can be grouped on separate pages (without narrative queries).

In most cases, this method is not preferable, since users formulate queries on the same topic both interrogatively and narratively: for example, “how to replace the faucet in the kitchen” and “replacing the faucet in the kitchen”.

Clustering by TOP is divided into 3 types

One of the most frequent keys is used as a basis, and all the others are selected for it. For example, “buy a wool blanket” is the main one, “buy a wool blanket in Moscow”, “buy a wool blanket price” are added to it. We look at the TOP 10 output results, how many requests intersect with URLs with the main key, and if the number of matches has reached or exceeded the threshold, the keywords are added to the cluster. As a result, all the keys in the cluster are combined with the main frequency word or phrase, but it is not necessary that they are connected to each other. This approach is relevant for informational websites and online stores with a simple structure. The “soft” grouping of queries is distinguished by an abundance of data, but not very accurate – mistakes are made in the distribution of phrases across pages to promote them.

Something in between Soft and Hard. We take the most frequent phrase as a basis, and compare the rest with it by the number of common addresses. Here, the “moderate” grouping is no different from the previous one, but then all the keywords are compared with each other. If the number of intersecting URLs has reached or exceeded the threshold, a cluster is formed. So in one group, all requests will contact each other in pairs, but they may differ in different pairs of URL addresses. It implies a comparison of the most frequent word with the subsequent ones by the number of common URL addresses in the TOP 10 of the output, as well as an additional comparison of all phrases among themselves and all URLs in the resulting pairs. That is, phrases are grouped into a cluster only when the URLs from the TOP 10 intersect in all queries, and their number reaches or exceeds the required threshold. Keywords match each other, and URLs in matching queries are similar. The “hard” method is suitable for competitive niches, in heavy commercial services such as insurance, lending, and so on. These topics are distinguished by an excessive number of synonymous words and direct queries on different topics. In hard clustering, the webmaster receives less data, but much more accurately than in software.

Query clustering problems Grouping queries is not an extremely difficult task, especially considering the capabilities of modern programs. The only possible problem is the list of keys that were not automatically assigned to any of the groups. The number of such requests in some cases can reach up to 30% of all. Machine algorithms help, but you still can’t do without using your own mind and hands. The latter is especially relevant when working with sites on highly specialized topics that are insufficiently disclosed and reflected on the Internet. For example, if we cluster queries for wheat, then the keys that include varieties (of which there are more than 30) will remain ungrouped, since neither the search engine nor the software algorithm is able to assess their value and subject affiliation. In addition, it should be borne in mind that search algorithms are constantly changing and are extremely sensitive in themselves. The top today and in a month will be different, which means that both the division into clusters and their filling, carried out at different times, will be variable.

Leave a Reply

Your email address will not be published. Required fields are marked *