Documentation
Contents
- Adding a site to the public search
- What types of site can be submitted to the public search?
- How do I add a site to the public search?
- How do I check on the status of a site I submitted?
- Adding a site to the search as a service
- What benefits do I get from a Full listing?
- Does my site have to appear in the public search if I want to use the search as a service?
- Can I change a Basic listing to a Full listing?
- How do I verify ownership of my site?
- Do I have to complete a Full listing in one session?
- Using the search
- Managing my site
- How do I add a search box to my site?
- What is the specification for the API?
- How does the indexing process work?
- How frequently are sites reindexed?
- How does the relevancy tuning work?
- Support
Adding a site to the public search
What types of site can be submitted to the public search?
Users are encouraged to submit any personal and independent websites which they believe will improve the public search, not just their own sites. See the Submission Guidelines section in the Terms of Use for further details, including a definition of personal and independent websites.
How do I add a site to the public search?
Use the Add Site link at the top. Anyone can submit any site for a Basic listing, although it has to be approved by a moderator before it is indexed. Note that only site owners can submit a Free Trial or Full listing, because the submission process requires verifying ownership of the site.
How do I check on the status of a Basic listing I submitted?
Given there is no user tracking for a Basic listing (see Privacy Policy) it isn't possible to notify you of changes to your submission directly. However, you can resubmit the site and see what the message is, i.e. it is now being indexed, still pending review, or rejected (along with rejection reason).
Adding a site to the search as a service
What benefits do I get from a Full listing?
The Full listing provides access to the search as a service features, such as more frequent indexing, indexing a higher number of pages, providing access to Manage Site to configure indexing and trigger on-demand reindexing, and enabling the API. See Add Site for further details.
Does my site have to appear in the public search if I want to use the search as a service?
No. When listing your site one of the questions is "Include in public search", to which you can select No. This allows you to use the search as a service even if your site isn't a personal or independent website. Note that, as per the Terms of Use, a moderator may exclude a Full listing from the public search.
Can I change a Basic listing to a Full listing?
Yes. Just resubmit the site, and select Full instead of Basic for the Listing tier.
How do I verify ownership of my site?
The easiest way to prove ownership is to use IndieAuth, but if you don't have that set up you can still submit your site with a process similar to that which you may have used for other services, i.e. you upload a specific piece of content to the domain's root or TXT record. The Add Site link will guide you through the process.
Do I have to complete a Full listing in one session?
No. For a Full listing, if you are unable to complete the process in one session, you can resubmit at a later time to pick up where you left off.
Using the search
What is the query syntax?
Individual words: e.g. antarctica. If there are two words, e.g. antarctica book, it will search for them both but not as a phrase, and if there are three or more words, e.g. book about antarctica, it will search for a minimum of two of the words, e.g. in this example that could include pages with "book" and "about" but not "antarctica".
Phrase search: enclose phrase in double quotes to search for the exact phrase, e.g. "book about antarctica"
Boolean search: use AND, &&, NOT, !, OR, ||, + and -, with ( and ) to group queries, e.g. for pages which contain the keywords antarctica and book use antarctica AND book, or pages with antarctica and book but not movie use antarctica AND book !movie
Wildcard search: * for multiple characters, and ? for single characters, e.g. *arctic*
Filters: name:value, e.g. for all the pages on the michael-lewis.com domain which contain the word antarctica use domain:michael-lewis.com AND antarctica, or for all the article type pages on the michael-lewis.com domain use domain:michael-lewis.com AND page_type:article. See below for full list of field names.
Other searches: e.g. fuzzy searches, proximity searches, range searches, boost, etc. see The Standard Query Parser and The Extended DisMax Query Parser.
What fields are available?
Name | Notes |
---|---|
id | URL of the web page, before following any redirects. Will be unique. |
url | URL of the web page, after following any redirects. Will be the same as id if there are no redirects, and might not be unique if there are redirects. |
domain | The domain to which the page belongs. |
is_home | Boolean value, i.e. true or false. If true, indicates that the page is the home page for the domain. |
title | Extracted from the title tag. |
author | Extracted from meta name="author". |
description | Extracted from meta name="description" or meta property="og:description". |
tags | Multivalued. Extracted from meta name="keywords" or meta property="article:tag". |
content | Text extracted from the main tag, or article tag, or body tag, with text from any nav, header and/or footer tags removed. |
page_type | Extracted from meta property="og:type" or article data-post-type=. |
page_last_modified | Extracted from the Last-Modified HTTP header. |
published_date | Extracted from meta property="article:published_time" or meta name="dc.date.issued" or meta itemprop="datePublished". |
date_domain_added | Date and time the domain was first added to the system for indexing. Only present on pages where is_home=true. |
owner_verified | Boolean value, i.e. true or false. If true, indicates that the page is from a site which has been verified by the owner. |
contains_adverts | Boolean value, i.e. true or false. If true, indicates that adverts have been detected on the page. |
language | Extracted from html lang=. |
language_primary | Language family, derived from the language attribute, e.g. if language=en-GB then language_primary=en. |
indexed_inlinks | Multivalued. Pages which link to this page (from other domains within the search index, i.e. not from this domain or domains which aren't indexed). |
indexed_outlinks | Multivalued. Pages to which this page links (to other domains within the search index, i.e. not to this domain or domains which aren't indexed). |
indexed_inlink_domains | Multivalued. Unique domains in indexed_inlinks. |
Managing my site
How do I add a search box to my site?
If you have a Full listing (or a Free Trial) you will have the API enabled. The API can be used in a server-side search page. Alternatively, you can use the API client-side, e.g. as per the very basic example at Adding a simple search page to my personal website with searchmysite.net.
If you only have a Basic listing, or don't want to use the API, a simple alternative is to have a form which takes a query and a domain hidden parameter containing the value of the domain to which to want to restrict results, e.g. (for michael-lewis.com):
<form action="https://searchmysite.net/search/">
<input type="search" name="q" ></input>
<input type="hidden" name="domain" value="michael-lewis.com"></input>
<input type="submit" value="Search"></input>
</form>
What is the specification for the API?
In summary, queries take the form /api/v1/search/<domain>?q=*, where parameters are:
- <domain>: the domain being searched (mandatory)
- q: query string (mandatory)
- page: the page number from which multi-page results should start (optional, default 1)
- resultsperpage: the number of results per page (optional, default 10)
Results are returned in the following format, with all fields optional apart from id and url:
{
"params": {
"q": "*",
"page": 1,
"resultsperpage": 10,
}
"totalresults": 40,
"results": [
{
"id": "https://server/path",
"url": "https://server/path",
"title": "Page title",
"author": "Author",
"description": "Page description",
"tags": ["tag1", "tag2"],
"page_type": "Page type, e.g. article",
"page_last_modified": "2020-07-17T00:00:00+00:00",
"published_date": "2020-07-17T00:00:00+00:00",
"language": "en",
"indexed_inlinks": ["inlink1", "inlink2"],
"indexed_outlinks": ["outlink1", "outlink2"],
"fragment": ["text before the search ", "query", " and text after"]
}
]
}
How does the indexing process work?
The indexing process first checks a robots.txt, and will obey any rules there. If the robots.txt allows, it will then load the home page and web feed (if a web feed was configured or discovered on a previous index), looking for links which it will follow breadth first, until there are no further pages to index or until the indexing page limit for the domain or the timeout is reached.
If you want to exclude certain pages from indexing, you would normally do this via robots.txt. If you have a Full listing you can also configure your listing to exclude content based on:
- path: i.e. URLs containing a certain string.
- type: i.e. values from the page_type field described above.
This might be useful to, for example, filter out micro blog entries which have a particular path or type.
How frequently are sites reindexed?
See Add Site for the latest information on indexing frequency. If you have a Full listing you can logon to Manage Site to see when your next reindex is due, and can of course trigger a reindex on demand.
How does the relevancy tuning work?
The following fields are used to determine how results are ranked: title, description, author, tags, url, content, indexed_inlink_domains_count, contains_adverts and owner_verified. There is further discussion of the relevancy tuning on some Blog posts, and of course the Source code is available for complete transparency.
Support
How do I raise a support query?
Use the Contact to reach out. If you think you have found a bug, you could also raise via https://github.com/searchmysite/searchmysite.net/issues.
What are the support hours for the search as a service?
As per About searchmysite.net, this is an evenings and weekends side-project, so support is only available outside normal business hours. I'm also periodically away on holiday. I believe this is reflected in the low cost. Note however that the service has been running reliably since July 2020.