FAQ relating to the
WWW Search Interfaces for Translators
for finding glossaries and parallel texts on the internet
http://www.multilingual.ch

 

QUESTIONS

Glossary search interfaces

--

Parallel text search interfaces

--

Other (or both of the above)

  1. Why are there so many different techniques to choose from for each language?

  2. Which search engine is recommended for a glossary/parallel text search?

  3. Why is it that the results list for internet searches features pages on which my search term cannot be found, neither on the page nor in the source text (html)?

 

ANSWERS

Glossary search interfaces

--

Parallel text search interfaces

--

Other (or both of the above)

  1. Why are there so many different techniques to choose from for each language?

    Each search technique uses different search criteria. The techniques had to be split up into separate searches to prevent Altavista from crashing (which happens when your search string is too long); Google also sets a limit to the length of search queries. Generally speaking, however, the first technique listed is the most effective.

    back to index

  2. Which search engine is recommended for a glossary/parallel text search?

    Take your pick:

    Altavista:
    PROS: allows longer, more sophisticated (and therefore more focused) search queries and the following syntax: NEAR (Boolean operator), * (wildcard for 0-10 additional characters), brackets (); is case-sensitive;
    CONS: searches are restricted to .htm, .html, .txt files;
      
    Google:
    PROS: covers many file types: pdf, ps, htm, html, wk1, wk2, wk3, wk4, wk5, wki, wks, wku, lwp, mw, asp, xls, ppt, doc, wks, wps, wdb, wri, rtf, ans, txt;
    CONS: limits the length of search queries; does not allow NEAR (Boolean operator) or * (wildcard for letters); is not case-sensitive.

    If you fail to find anything with one search engine, try the other.

    back to index

  3. Why is it that the results list for internet searches features pages on which my search term cannot be found, neither on the page nor in the source text (html)?

    You have searched for a term using, say, Altavista. Altavista lists pages that presumably contain your term. You open a page, search for your term on that page with CTRL-F (or Edit/Find) and ... it's not there! Don't panic:

    It could be because:


    a) The webmaster has made changes to the page in question (perhaps removing or modifying the term or phrase that you searched for) since Altavista last "indexed" it (=had a look at it and entered the page's details in its database: NOTE: When you search with Altavista, you are NOT searching the web but Altavista's database of information about pages that are out there on the web; It is only when you select a page from the results list that Altavista directs you to the actual web page). Altavista updates its index about once a month.


    b) You are being forwarded to another page, which sometimes happens so fast that you don’t realize it. Your term might have been on the first page that appears briefly, but then not on the page to which you are being redirected. This often happens when you reach a page that is no longer available on a particular website: what the website does is redirect you to another page (usually the home page or some other main page at a specific level of the web site) so that you can find your way around the website all the same instead of hitting a "404 Page cannot be found" error message.


    c) In Altavista your search term was for example "home-made", but the page actually contains "home made" (without the hyphen). Therefore, if you search for "home-made" on the page (with CTRL-F), you will be told that the term does not exist! TIP: search for "home" or "made" only.
    The same can happen with words with special characters: A search for pages containing "ecole" will find pages containing "école", but if you then search for "ecole" you will not find it because it is actually present on the page as "école".
    Sometimes if you search for pages containing "canhão" (with special character) you will find pages that instead contain "canhao" (without special character), so be flexible and bear this in mind.
    This happens because search engines do not generally consider punctuation when indexing a page.


    d) If you had in your query: title:glossary and (outperformer OR out-performer)
    (with two alternative spellings for the term "out-performer"), when you try to locate your term on a page make sure you look for a string of characters that the two have in common (e.g. performer) as you do not yet know which term is present on the page that you have found.


    e) When searching for your term on a page (with CTRL-F), make sure you have not typed an extra space before or after your term: If you search for "home " and "home" is at the end of a sentence (followed by a full-stop) or followed by a hyphen, you will not find it!


    f) Check your spelling: are you looking for exactly the same term as you entered in the search engine?


    g) Some words are ignored by certain search engines:
    Example: Until recently, Google used to ignore common words such as in, of, de, la etc., so if you were to search for the phrase "in reliance of", you would have actually been searching for pages containing "reliance" only.
    Now this is no longer the case with Google when you search for a phrase (" ").

    Sometimes you CAN force a search engine to accept these "stop words" by adding a "+" sign in front of the common terms:

         "+in reliance +of"

    (the quotes indicate a phrase, and the + sign forces the stop words).


    h) Sometimes the words that you have searched for are not visible on the page but are hidden (meta tags, alternative text for images etc.). To view the source code of the page select "View HTML" in your browser.

    back to index