Allow code search by filename (#32210)

This is a large and complex PR, so let me explain in detail its changes.

First, I had to create new index mappings for Bleve and ElasticSerach as
the current ones do not support search by filename. This requires Gitea
to recreate the code search indexes (I do not know if this is a breaking
change, but I feel it deserves a heads-up).

I've used [this
approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html)
to model the filename index. It allows us to efficiently search for both
the full path and the name of a file. Bleve, however, does not support
this out-of-box, so I had to code a brand new [token
filter](https://blevesearch.com/docs/Token-Filters/) to generate the
search terms.

I also did an overhaul in the `indexer_test.go` file. It now asserts the
order of the expected results (this is important since matches based on
the name of a file are more relevant than those based on its content).
I've added new test scenarios that deal with searching by filename. They
use a new repo included in the Gitea fixture.

The screenshot below depicts how Gitea shows the search results. It
shows results based on content in the same way as the current version
does. In matches based on the filename, the first seven lines of the
file contents are shown (BTW, this is how GitHub does it).


![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858)

Resolves #32096

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
This commit is contained in:
Bruno Sofiato
2024-10-11 20:35:04 -03:00
committed by GitHub
parent 0fe5e2b08c
commit 900ac62251
38 changed files with 721 additions and 50 deletions

View File

@@ -0,0 +1 @@
ref: refs/heads/master

View File

@@ -0,0 +1,4 @@
[core]
repositoryformatversion = 0
filemode = true
bare = true

View File

@@ -0,0 +1,8 @@
This repository will be used to test code search. The snippet below shows its directory structure
.
├── avocado.md
├── cucumber.md
├── ham.md
└── potato
└── ham.md

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
ORI_DIR=`pwd`
SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
cd "$ORI_DIR"
for i in `ls "$SHELL_FOLDER/post-receive.d"`; do
sh "$SHELL_FOLDER/post-receive.d/$i"
done

View File

@@ -0,0 +1,2 @@
#!/usr/bin/env bash
"$GITEA_ROOT/gitea" hook --config="$GITEA_ROOT/$GITEA_CONF" post-receive

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
ORI_DIR=`pwd`
SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
cd "$ORI_DIR"
for i in `ls "$SHELL_FOLDER/pre-receive.d"`; do
sh "$SHELL_FOLDER/pre-receive.d/$i"
done

View File

@@ -0,0 +1,2 @@
#!/usr/bin/env bash
"$GITEA_ROOT/gitea" hook --config="$GITEA_ROOT/$GITEA_CONF" pre-receive

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
ORI_DIR=`pwd`
SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
cd "$ORI_DIR"
for i in `ls "$SHELL_FOLDER/proc-receive.d"`; do
sh "$SHELL_FOLDER/proc-receive.d/$i"
done

View File

@@ -0,0 +1,2 @@
#!/usr/bin/env bash
"$GITEA_ROOT/gitea" hook --config="$GITEA_ROOT/$GITEA_CONF" proc-receive

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
ORI_DIR=`pwd`
SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
cd "$ORI_DIR"
for i in `ls "$SHELL_FOLDER/update.d"`; do
sh "$SHELL_FOLDER/update.d/$i" $1 $2 $3
done

View File

@@ -0,0 +1,2 @@
#!/usr/bin/env bash
"$GITEA_ROOT/gitea" hook --config="$GITEA_ROOT/$GITEA_CONF" update $1 $2 $3

View File

@@ -0,0 +1,6 @@
# git ls-files --others --exclude-from=.git/info/exclude
# Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
# *.[oa]
# *~

View File

@@ -0,0 +1,13 @@
90c1019714259b24fb81711d4416ac0f18667dfa refs/heads/DefaultBranch
985f0301dba5e7b34be866819cd15ad3d8f508ee refs/heads/branch2
65f1bf27bc3bf70f64657658635e66094edbcb4d refs/heads/develop
65f1bf27bc3bf70f64657658635e66094edbcb4d refs/heads/feature/1
78fb907e3a3309eae4fe8fef030874cebbf1cd5e refs/heads/home-md-img-check
3731fe53b763859aaf83e703ee731f6b9447ff1e refs/heads/master
62fb502a7172d4453f0322a2cc85bddffa57f07a refs/heads/pr-to-update
4649299398e4d39a5c09eb4f534df6f1e1eb87cc refs/heads/sub-home-md-img-check
3fa2f829675543ecfc16b2891aebe8bf0608a8f4 refs/notes/commits
4a357436d925b5c974181ff12a994538ddc5a269 refs/pull/2/head
5f22f7d0d95d614d25a5b68592adb345a4b5c7fd refs/pull/3/head
62fb502a7172d4453f0322a2cc85bddffa57f07a refs/pull/5/head
65f1bf27bc3bf70f64657658635e66094edbcb4d refs/tags/v1.1

View File

@@ -0,0 +1,2 @@
P pack-393dc29256bc27cb2ec73898507df710be7a3cf5.pack

View File

@@ -0,0 +1,14 @@
# pack-refs with: peeled fully-peeled sorted
90c1019714259b24fb81711d4416ac0f18667dfa refs/heads/DefaultBranch
985f0301dba5e7b34be866819cd15ad3d8f508ee refs/heads/branch2
65f1bf27bc3bf70f64657658635e66094edbcb4d refs/heads/develop
65f1bf27bc3bf70f64657658635e66094edbcb4d refs/heads/feature/1
78fb907e3a3309eae4fe8fef030874cebbf1cd5e refs/heads/home-md-img-check
3731fe53b763859aaf83e703ee731f6b9447ff1e refs/heads/master
62fb502a7172d4453f0322a2cc85bddffa57f07a refs/heads/pr-to-update
4649299398e4d39a5c09eb4f534df6f1e1eb87cc refs/heads/sub-home-md-img-check
3fa2f829675543ecfc16b2891aebe8bf0608a8f4 refs/notes/commits
4a357436d925b5c974181ff12a994538ddc5a269 refs/pull/2/head
5f22f7d0d95d614d25a5b68592adb345a4b5c7fd refs/pull/3/head
62fb502a7172d4453f0322a2cc85bddffa57f07a refs/pull/5/head
65f1bf27bc3bf70f64657658635e66094edbcb4d refs/tags/v1.1