Setting up Searx

What is Searx?

Searx for those who don't know, is a decentralized metasearch engine based on python and it aggregates results from more than 70 search services. Searx isn't just one site; users have the ability to use public instances of searx or they can install their own searx engine. In fact, anyone can install this on any server as it is free software, I have my instance of searx running on this very same server as my website. The installation itself is very well documented on their site, however, I would like to make some side notes and have everything in one place, hopefully it will even aid some of you through the process.

You may be wondering what is the point of running your own search engine over using something like Google. The answer is freedom. Not only that, but the actual mechanism of searx is considerably better than that of what Google uses. Searx takes whatever term you're searching for and pulls results from several different search engines, which the results are then all displayed in one place. It is quite similar to Startpage, except that this one only uses Google's search results (without directly contacting Google). The advantages of searx are of course, that you aren't directly sending your requests to Google's servers, instead you're aggregating these results while using other third party search engines. What makes searx different from Startpage, is that it pulls results from many other search engines, not only from Google. It is also highly extensible, unlike other search engines which have very little customization features. You can aswell, enable dark theme, choose what kind of results you want the search engines to pull, etc. The process of installing the searx engine is really not that hard, all you have to do is copy and paste a few commands and within 10 minutes, you will have your own search engine up and running.

I assume many of you have also heard of DuckDuckGo, however, even search engines like that fall short to searx in terms of the features which many search engines are lacking. Another crucial thing is, do you really know what DuckDuckGo is running on their servers? Or what exactly are they logging in terms of search requests from their users, etc. They claim to be free and open source like many other applications out there such as Telegram, etc. Of course you can view the source code of those various applications, but once your information is processed through their proprietary servers, you really have no way of knowing what kind of software they are running beyond that of their open source code. This also goes for many different search engines, applications, which all claim to protect your privacy. But that is why there is a search engine called searx, which might be by far the best search engine as of right now. By using searx, you're not directly accessing DuckDuckGo or Google, instead it is doing that for you.


Majority of "open source" search engines such as DuckDuckGo actually use proprietary servers, meaning that you have no way of knowing what they are truly running on their servers. You basically just have to "trust" them.

When searching for terms in searx, you will have the ability to view from which search engines that particular result was pulled from. In addition, there are different categories to pick from when searching for terms in searx. Therefore, if you're specifically searching for files or images, you can click on the categories and that'll present you with each of the results which were pulled from the search engines. The files category pulls a bunch of different results from torrent sites, and that can be quite handy. Searx has a specific category for news, music, maps (openstreetmap is supported), it (stackoverflow) etc. The great things with searx, is that it has a lot more categories than Google and DuckDuckGo. You can even select which search engines you would like searx to pull results from, under the preferences tab in the upper right-hand corner. As a result if one search engine is taking particularly long to load, you have the ability to disable it. Like I mentioned before, the most significant part about this is that your computer does not directly send requests to the other search engines, searx does that all for you.

You can use searx shortcuts when making arbitrary searches. Ergo if you're looking for only files of a specific search term, you can use "!files term" and that will only present results with files. Alternatively, you can use "!images term" to only search for images. This makes searching much more efficient with the immense amount of shortcut features that are available for customizing. You can search with specific search engines too (e.g., if you wanted to only use Startpage), you could type "!sp term" and that will only filter/pull results based off what term(s) you give it from that search engine. On top of that if you go to search for academic articles, I have my open access DOI resolver set to sci-hub.tw, this essentially removes any paywalls and redirects you to that site instead. Or if you are seeking free books, then you can enable Library Genesis as one of your search engines and the next time you go to search for some particular book, it will pull results from that site. Once again, a lot of this can be found under the preferences tab within the search engine.

Installing Searx with Nginx

Search

Before Installing Searx

Now that you are somewhat informed on what searx is, it is time to install it for those who would like to run it on their servers. The following commands down below are all from (1) Step by step installation, (2) uwsgi and (3) Install with nginx. However, I will be including some additional commands as well. The process is basically just copy and pasting a bunch of commands, so feel free to do just that if you don't fully understand what you're doing. A little side note, the following commands which I am going to provide are primarily for nginx servers, but you can find ones for apache as well on their Install with apache. I used to use apache a while back, but I find configuring nginx to be somewhat simpler and it's more lightweight too.

Install Packages

$ sudo -H apt-get install -y \
    virtualenv python3-dev python3-babel python3-venv \
    uwsgi uwsgi-plugin-python3 \
    git build-essential libxslt-dev zlib1g-dev libffi-dev libssl-dev \
    shellcheck

Create User

In case the first command for creating a searx user doesn't work, then run sudo -H useradd searx -d /usr/local/searx.

$ sudo -H useradd --shell /bin/bash --system \
    --home-dir /usr/local/searx \
    --comment 'Privacy-respecting metasearch engine' searx

$ sudo -H mkdir /usr/local/searx
$ sudo -H chown -R searx:searx /usr/local/searx

Install Searx & Dependencies

To start an interactive shell from the newly created user and to clone searx run:

$ sudo -H -u searx -i
(searx)$ git clone https://github.com/asciimoo/searx.git /usr/local/searx/searx-src

To create virtualenv in the same shell:

(searx)$ python3 -m venv /usr/local/searx/searx-pyenv
(searx)$ echo . /usr/local/searx/searx-pyenv/bin/activate >>  /usr/local/searx/.profile

Next, exit the searx bash session which you have opened above by pressing ctrl-d, so that you can restart a new one. It is recommended that before you go to install, check if your virtualenv was sourced from the login (~/.profile):

$ sudo -H -u searx -i

(searx)$ command -v python && python --version
/usr/local/searx/searx-pyenv/bin/python
Python 3.8.1

# update pip's boilerplate ..
pip install -U pip
pip install -U setuptools
pip install -U wheel

# jump to searx's working tree and install searx into virtualenv
(searx)$ cd /usr/local/searx/searx-src
(searx)$ pip install -e .

Configuration

Thereafter, inside of /etc you are going to create a copy of the git://searx/settings.yml configuration file. Sometimes the first command may not work because the /etc/searx directory is not already created, all you have to do is create it by running sudo mkdir /etc/searx. Also, here I'm using vim to edit the file, but you can use whatever your desired text editor is. The last command can be omitted by changing the name of your searx instance inside of the /etc/searx/settings.yml file itself, just look for instance_name and put whatever you want the name of your searx instance to be inside of the quotation marks. In that same file, I tend to set oscar-style to logicodev-dark, so that I have dark theme enabled by default.

$ sudo -H cp /usr/local/searx/searx-src/searx/settings.yml /etc/searx/settings.yml
$ vim /etc/searx/settings.yml
$ sudo -H sed -i -e s/ultrasecretkey/\367ed0f4f4f300bd1457e5c87d036ab2/g /etc/searx/settings.yml
$ sudo -H sed -i -e s/{instance_name}/searx@\ryzen/g /etc/searx/settings.yml

Check

I personally didn't do this, however, if you would like to check that you installed searx correctly, you can follow the steps down below. All you have to do is enable debugging and start the webapp. Then searx checks the /etc/searx/settings.yml configuration file by exporting it to $SEARX_SETTINGS_PATH. Later, you want to make sure you disable the debugging option, so don't forget to do that after you're done testing the instance.

# enable debug ..
$ sudo -H sed -i -e s/debug : False/debug : True/g /etc/searx/settings.yml

# start webapp
$ sudo -H -u searx -i
(searx)$ cd /usr/local/searx/searx-src
(searx)$ export SEARX_SETTINGS_PATH=/etc/searx/settings.yml
(searx)$ python searx/webapp.py

# disable debug
$ sudo -H sed -i -e s/debug : True/debug : False/g /etc/searx/settings.yml

Configuring uWSGI

To enable uwsgi, you want to enable the uwsgi ini file by creating a symbolic link:

$ ln -s /etc/uwsgi/apps-available/searx.ini /etc/uwsgi/apps-enabled/

Aftwards, you will have to create the configuration ini-file so that uwsgi can actually work properly, the text below is exclusively for Debian. Copy and paste the content below into the searx.ini file. After you execute that, be sure to restart uwsgi by running the command under the uWSGI text.

$ vim /etc/uwsgi/apps-available/searx.ini
[uwsgi]

# uWSGI core
# ----------
#
# https://uwsgi-docs.readthedocs.io/en/latest/Options.html#uwsgi-core

# Who will run the code
uid = searx
gid = searx

# set (python) default encoding UTF-8
env = LANG=C.UTF-8
env = LANGUAGE=C.UTF-8
env = LC_ALL=C.UTF-8

# chdir to specified directory before apps loading
chdir = /usr/local/searx/searx-src/searx

# searx configuration (settings.yml)
env = SEARX_SETTINGS_PATH=/etc/searx/settings.yml

# disable logging for privacy
disable-logging = true

# The right granted on the created socket
chmod-socket = 666

# Plugin to use and interpretor config
single-interpreter = true

# enable master process
master = true

# load apps in each worker instead of the master
lazy-apps = true

# load uWSGI plugins
plugin = python3,http

# By default the Python plugin does not initialize the GIL.  This means your
# app-generated threads will not run.  If you need threads, remember to enable
# them with enable-threads.  Running uWSGI in multithreading mode (with the
# threads options) will automatically enable threading support. This *strange*
# default behaviour is for performance reasons.
enable-threads = true


# plugin: python
# --------------
#
# https://uwsgi-docs.readthedocs.io/en/latest/Options.html#plugin-python

# load a WSGI module
module = searx.webapp

# set PYTHONHOME/virtualenv
virtualenv = /usr/local/searx/searx-pyenv

# add directory (or glob) to pythonpath
pythonpath = /usr/local/searx/searx-src


# speak to upstream
# -----------------
#
# Activate the 'http' configuration for filtron or activate the 'socket'
# configuration if you setup your HTTP server to use uWSGI protocol via sockets.

# using IP:
#
# https://uwsgi-docs.readthedocs.io/en/latest/Options.html#plugin-http
# Native HTTP support: https://uwsgi-docs.readthedocs.io/en/latest/HTTP.html

http = 127.0.0.1:8888

# using unix-sockets:
#
# On some distributions you need to create the app folder for the sockets::
#
#   mkdir -p /run/uwsgi/app/searx
#   chmod -R searx:searx  /run/uwsgi/app/searx
#
# socket = /run/uwsgi/app/searx/socket
$ sudo -H service uwsgi restart searx

The Nginx HTTP Server

Like I mentioned before, this guide is primarily focusing on nginx (the package nginx-light will not work with uwsgi), but I have included documentation for apache up above. The other two packages are for enabling HTTPS on your searx site, which is an essential for searx or else it won't run properly. And just in general, you should always use the secure protocol because it encrypts information your webpage (in this case it is a search engine).

$ sudo apt-get install nginx certbot python-certbot-nginx

A Nginx Searx Site

After nginx is installed, you have to create the configuration file and create a symbolic link to /etc/nginx/sites-enabled/searx (this is where certbot will look when you go to enable SSL). If you haven't used nginx before, everytime you want to create/enable a site you have to first create the configuration file in /etc/nginx/sites-available/site and then symlink it to /etc/nginx/sites-enabled/site (you can replace site in this example with whatever you want). To learn more about nginx, have a look at this Beginner's Guide and the Getting Started Wiki is a useful resource to have on hand.

This configuration file is created at /etc/nginx/sites-available/searx and is then symlinked to sites-enabled:

sudo -H ln -s /etc/nginx/sites-available/searx /etc/nginx/sites-enabled/searx

Next, open the configuration file /etc/nginx/sites-available/searx and copy the following content down below into it. The only thing you should change here is the server_name to the domain which you will be hosting your searx engine at, don't forget to have one for www. These should also already be set under your CNAME records in your domain's registrar, to point back to the IP address of your VPS (for more info look under installation).

$ vim /etc/nginx/sites-available/searx
server {
    server_name searx.stoisavljevic.com www.searx.stoisavljevic.com;

    listen 80;
    listen [::]:80;

    location / {
        include uwsgi_params;
        uwsgi_pass unix:/run/uwsgi/app/searx/socket;
    }

    root /usr/local/searx/searx-src/searx;
    location /static { }
}

Usually the folder for the unix sockets will not already exist, so you will need to create that. If it is already created then don't worry, but running the command again wouldn't hurt you so it doesn't matter. Also, it is suggested that you add searx to its permissions group:

$ mkdir -p /run/uwsgi/app/searx/
$ sudo -H chown -R searx:searx /run/uwsgi/app/searx/

Disable Logs

By default, logs will be enabled for your nginx webserver if you haven't already disabled them. To enhance your privacy, you can disable the log files which are automatically created by nginx in /etc/nginx/nginx.conf, just look for where it says Logging Settings and modify them to the two lines below:

$ vim /etc/nginx/nginx.conf
http {
    ##
    # Logging Settings
    ##

    access_log /dev/null;
    error_log  /dev/null;

    ## ...
}

Finally, be sure to restart the services by running:

$ sudo -H systemctl restart nginx
$ sudo -H service uwsgi restart searx

Enabling SSL

You have now essentially configured searx, but you are one step away from actually being able to use it. By default, you can't use searx without SSL (Secure Socket Layer), which is what you are now going to enable. If you don't already know, HTTPS is just HTTP (HyperText Transfer Protocol) but with SSL and that establishes an encrypted connection between the user and the site. For this you need to have certbot and python-certbot-nginx installed, which if you have been following the guide correctly, should already be installed. But those are the two packages you would want, just in case it isn't already installed.

$ certbot --nginx

Just to clarify, certbot is completely free and it gives you HTTPS for whatever site you want to use it for. Everyone should use this, there is no need to be paying for any other kinds of SSL certificates. I have used this for all my sites and never had any issues. Certbot uses Let's Encrypt and handles all the signing automatically for you (it automatically renews every 3 months). For the usage of certbot, I don't feel like this part needs much explaining, just make sure you select the right domains and be sure to enable HTTPS rewrite when asked upon. So that in case if people aren't reaching your domain via HTTPS (e.g., they could have accidentally typed HTTP), they will automatically be redirected to HTTPS.

Congratulations

After a lengthy and impenetrable journey, you have now successfully configured searx and enabled HTTPS by using certbot. You are finally free from data-harvesting organizations like Google, as long as you never touch their search engine again, that is. No more biased search results either, as searx implements many different search engines for your search results. This is by far, one of the best ways to achieve private search results on the web, without having massive organizations constantly profit and harvest data on you. Whenever you can, always try to run your own software on your own server; this same logic applies to VPNs, email, sms, etc. You can also tell your friends and family members to use your searx engine, as that will greatly improve their privacy as well. At last, you are now able to freely surf the web using your very own private search engine.