Named Entity Recognition

A Named Entity (NE) is a phrase in the text which uniquely refers to an entity of the world. It includes proper nouns, dates, identification numbers, phone numbers, e-mail addresses and so on. As the identification of dates and other simpler categories are usually carried out by hand-written regular expressions we will focus on proper names like organisations, persons, locations, genes or proteins.
The identification and classification of proper nouns in plain text is of key importance in numerous natural language processing applications. It is the first step of an IE system as proper names generally carry important information about the text itself, and thus are targets for extraction. Moreover Named Entity Recognition (NER) can be a stand-alone application as well and besides IE, Machine Translation also has to handle proper nouns and other sorts of words in a different way due to the specific translation rules that apply to them.
For the Hungarian NER we constructed manually annotated corpora. Based on this and other available English corpora we developed a NER system. It employs a rich feature set and has been successfully applied to Hungarian and English newswire NER and also to English clinical NER.

Download

Named Entity Recognition tool for Hungarian [download]

  • (using the CRF implementation of MALLET)
  • The English trained model will be available soon.

How to use from commandline

  • Parameters:
    • mode: It defines the process(es) to be executed. Possible values are:
      • predicate
    • input: It defines the input file on which the process will be executed. The input file must be a txt file containing running (raw) text.
    • output: It defines the output file in which the analysis will be saved. If this parameter is not set, then the predication will be displayed on the default output.
  • Examples:
    • java -Xmx3G -jar ner.jar -mode predicate -input input.txt -output output.txt
    • java -Xmx3G -jar ner.jar -mode predicate -input input.txt

How to use from Java code

  • NamedEntityRecognizer ner = new NamedEntityRecognizer();
  • ner.predicate("Egyszerû szöveges tartalom Szegedrõl.");

References