#	This file is part of the software similarity tester SIM.
#	Written by Dick Grune, Vrije Universiteit, Amsterdam.
#	$Id: README,v 2.17 2015-04-29 18:18:21 dick Exp $

These programs test for similar or equal stretches in one or more program
or text files and can be used to detect common code or plagiarism. See sim.pdf.
Checkers are available for C, Java, Pascal, Modula-2, Lisp, Miranda and
natural language text.

>>>> NEW, Apr 2015:
     - better percentage computation

>>>> NEW, Jan 2014:
     - 64-bit compatible
     - works also on 32-bit machines with software 64-bit emulator
     - accepts | as new-old separator

>>>> NEW, June 6, 2012:
    - greatly improved percentage computation
    - increased resolution, reducing false positives in sim_text
    - // comments in C recognized
    - characters 0200-0377 accepted in sim_text
    - s p a c e d   w o r d s recognized in sim_text
    - UNICODE file names accepted
    - manual page in PDF

==== To install on any system with gcc, flex, cp, ln, echo, rm, and wc,
or their equivalents, for example UNIX/Linux or MSDOS+MinGW:

Unpack the archive sim_2_*.zip

To compile and test, edit the Makefile to fit the local situation, and call:

    make test

This will generate one executable called sim_c, the checker for C, and will
run two small tests to show sample output.

To install, examine the Makefile, edit BINDIR and MAN1DIR to sensible paths,
and call

    make install

To change the default run size or the page width, adjust the file settings.par
and recompile.

==== To install on MSDOS, if you don't have a C compiler, the archive
sim_exe_2_*.zip contains:

    SIM_C.EXE       similarity tester for C
    SIM_JAVA.EXE       similarity tester for Java
    SIM_PASC.EXE       similarity tester for Pascal
    SIM_M2.EXE       similarity tester for Modula-2
    SIM_LISP.EXE       similarity tester for Lisp
    SIM_MIRA.EXE       similarity tester for Miranda
    SIM_TEXT.EXE       similarity tester for text

==== To extend:

To add another language L, write a file Llang.l along the lines of clang.l
and the other *lang.l files, extend the Makefile and recompile.
All knowledge about a given language L is located in Llang.l; the rest of
the programs expect each token to be a 16-bit character.

Available at present:
    clang.l javalang.l pascallang.l m2lang.llisplang.l miralang.l textlang.l


                Dick Grune
                email: dick@dickgrune.com
                http://www.dickgrune.com
