ZhGrep - Unicode Plain-Text File Full-Text Search Tool

Introduction


[For those who are impatient, here you are the download page: Download DAAV ZhGrep]

Have you ever had the moment scratching your head, pulling your hairs just because you have totally forgotten where you have put your favorite song on your hard disk, and the only thing you have in mind is part of its lyrics that fortunately saved in its lyric file by a media player helper program?

Or, you have extracted an open-source PHP archive which amounts to more than 20MB on the hard disk, which is not so uncommon nowadays. Now you want to find out which one among those thousands of files contains a particular string that you are looking up. Development IDEs or tools often search only opened files, not including those un-opened sleeping in the installation directory.

On recent Windows, you could use the search button of the system file explorer utility. However, from our experiences, it is unsurprisingly buggy and from time to time crashes the whole file explorer thing including the background process (you notice this because the desktop icons all of sudden disappear totally for a while and then they are refreshed to be viewable again when the background file explorer process is restarted).

Alternatively you could install a monster desktop search component like what Microsoft and Google offer, but many people do not want to. You only need a delicate and decent plain-text file full-text search tool! This is what our ZhGrep program could do for you.

Features


The shortname of the program - ZhGrep, is borrowed from the classical Unix/Linux/BSD command grep. Apart from the name and helping people achieve a similar goal, the program has nothing else to do with grep.

Cross-Platform


Since version 0.107, DAAV ZhGrep runs on both Microsoft Windows and Linux (tested on Ubuntu).

Multi-Lingual


The prefix "Zh" on the one hand stands for the Chinese language, the mother tongue of its developer and the language in which files are often searched by us; on the other hand, we use it as a symbol stating that this program is capable of processing all languages as long as you can input in your computer, since Chinese is a typical example of many languages around the world that require a non-Ascii multi-byte encoding scheme to store on the disk.

The software should work for all languages that your Windows supports without any problems. Please kindly keep us posted if you find anything special. Thanks.

Encoding Auto-Detection


Despite the long name "DAAV Full-Text Search Tool for Unicode Plain-Text Files" containing "Unicode", ZhGrep does not limit itself from searching only files saved in Unicode format (UTF-8, UCS-2/UTF-16 LE/BE, UCS-4/UTF-32), it reads and searches files saved in local French/German/Arabian/Hebrew/Japanese/Korean encodings (to be politically correct, they are not encodings but do have their respective local encodings), just to name a few.

When ZhGrep processes a text file, it auto-detects the encoding used when the text file was previously saved. Even for the same language, a text file could be saved in any possible encodings. An English plain-text file can be saved in ASCII, or ISO-8859-1, or UTF-8/16/32 as long as the creator would like to do so. Similar for Chinese, French, German, and any else languages.

Fortunately ZhGrep relies on internal algorithms to make such detection automatic. ZhGrep then uses the detected encoding to read and display the content of the matched plain-text file. ZhGrep also provides to the user the possibility to view the content in different encodings, some will work, others may just result in garbled text. In that case, switch back to the right encoding.

In most cases, the encoding auto-detection algorithms work pretty well. If you find the program miscalculate the encoding of your plain-text file, please do not hesitate to let us know so that we could improve the software for future releases.

Filename Search Capability


As a must-have bonus, we include the functionality of filename search in this program too.

In short, ZhGrep does full-text search on plain-text files in whatever language, helping you locating the ones containing your given keywords.

Checked File Types


ZhGrep looks at the suffix of a filename to decide whether the file is included as a plain-text one in a full-text search. However, files sometimes are not strictly named following this convention, e.g. a file named "binary.txt" may well contains only binary data in spite of its ".txt" suffix. In this case, ZhGrep still processes the file, but almost for sure, the file will not match any of your given keywords thus do not appear in the result file list.

Attention: When searching for filenames, ZhGrep does not exclude non-plain-text files. Namely, all files are searched in this case.

Currently, ZhGrep by default check files with the following suffixes when doing a full-text search, assuming they are all plain-text. (Please note, .doc and .pdf are not plain-text files!)
  • *.txt, *.text
  • *.html, *.htm, *.xml, *.sgml, *.sgm, *.opml, *.kml, *.lxfml, *.yml, *.yml2, *.yaml
  • *.svg
  • *.php, *.asp, *.js, *.css
  • *.pl, *.perl, *.rb, *.ruby, *.lua, *.has
  • *.lsp, *.lisp
  • *.vhdl
  • *.java, *.mf
  • *.tcl, *.py, *.bas, *.vbs, *.pas
  • *.c, *.cc, *.cpp, *.c++, *.cxx
  • *.h, *.hh, *.hpp, *.h++, *.hxx
  • *.f, *.for, *.f90
  • *.mak, *.log, *.conf, *.cfg
  • *.readme, *.me
  • *.nfo
  • *.ts, *.ui, *.qrc, *.pro, *.rc
  • *.nsi, *.nsh
  • *.bat, *.sh
  • *.m3u, *.pls, *.lst
  • *.lrc
  • *.latex, *.bib, *.tex
  • *.asc, *.ascii, *.utf8
  • *.hz
  • *.sms
Depending on feedback, we may add more or remove some of them in future versions.

Usage & Screenshots


We illustrate the main usage of ZhGrep here. Details are left to you, my dear users to explore. If you find this program useful or helpful, please do not hesitate to write us to share your experiences, or post us your feedback. If you would kindly like to volunteer to translate this software into your mother tongue, simply drop us a line too. Have fun!

Doing a Full-Text Search in Plain-Text Files


As the tooltip you get says, when you hover the cursor over certain fields or buttons, you may search with keywords in whatever languages as long as the computer system supports the input of them. Well, you need fonts necessary to show them too, otherwise, you might see garbled text in the bottom right pane.

The screenshots below shows in an English user interface, how the search of 2 Chinese keywords goes. As this is a full-text search (see the button?), in total 290 plain-text files, here the .lrc lyric files in the given directory and all of its sub-directories, that were searched, two were found to contain both of the 2 keywords.

The files are listed in a table in the bottom left pane. When you left click anyone of the result files, its content with the keywords highlighted is shown in the bottom right pane. If you right click a result file, the context menu pops up letting you take further actions: You can either copy the full file path to the system clipboard, or open its containing folder in the system file explorer.

Image


Minimizing to the System Tray


When the program window is closed for the first time, a message box is prompted telling you that the program will keep running in the system tray.

Image


This FYI message box will not appear again later on. Now if we hover the cursor over the icon in the system tray, a small window is shown with a summary of last search operation (if there is) in addition to the program version information.

Image


Multi-Lingual Capability


To demonstrate the multilingual feature of ZhGrep, we include here a screenshot of searching a number of French keywords in a Simplified Chinese user interface. You can see the word "âme" including a special French accented letter (avec l'accent aigu) is perfectly matched!

Image



This time, we want to search for a filename containing "für dich" where "ü" is a special German accented letter (Umlaut). As ZhGrep always perform substring-matching when doing filename search, it will list all files that contain either "für" or "dich". The result here happen to be a file containing both of these two words. In this case, it is the full path of the matched and selected file that is shown in the bottom-right pane.

Image


Caveat


ZhGrep follow a set of internal conventions when doing search. ZhGrep chooses to do so in order not to make the program too complicated. Notably:
  • When doing multi-keyword full-text search, ZhGrep only searches plain-text files. Doing filename search is not limited to plain-text files though.
  • When doing filename search, ZhGrep always apply substring-matching.
  • When doing multi-keyword full-text search, ZhGrep always apply whole-word-matching.
  • Depending on the detected system language, ZhGrep decides whether whole-word keywords should be delimited by spaces (in addition to punctuation marks). For example, when the system language is detected to by zh_CN (Simplified Chinese), ZhGrep will NOT require a word to be surrounded by spaces to be matched as a whole word, even when the search is done on a non-Chinese language. This is why you see "danses" and "dansent" also match the keyword "dans" shown above in the section of "Multi-Lingual Capability". However, if your system languge is detected to be say French, ZhGrep will always try to match whole-word surrounded by spaces, thus "danses" or "dansent" will not be matched in this very example. In future versions, we may add a checkbox to allow users speicify whether to append such demiliters when doing searches.

Recognitions


Information about the recognitions that DAAV ZhGrep has received is available on the download page: Download DAAV ZhGrep