isi.rb version 0.8 (formerly named isi2bibtex.rb)

From(投稿者):	NISHIMATSU Takeshi <t-nissie@imr.edu>
Newsgroups(投稿グループ):	fj.sources,fj.comp.lang.ruby,fj.comp.texhax
Followup-to(フォローアップ記事の投稿先指定):	fj.sources.d
Subject(見出し):	isi.rb version 0.8 (formerly named isi2bibtex.rb)
Date(投稿日時):	13 Apr 2005 22:12:57 +0900
Organization(所属):	Tohoku Univ InterNetNews Site
Message-ID(記事識別符号):	(G) <yeymzs27rc6.fsf@pentium3dual.imr.tohoku.ac.jp>
Followuped-by(子記事):	(G) <3991786news.pl@rananim.ie.u-ryukyu.ac.jp>

From(投稿者):

NISHIMATSU Takeshi <t-nissie@imr.edu>

Newsgroups(投稿グループ):

fj.sources,fj.comp.lang.ruby,fj.comp.texhax

Followup-to(フォローアップ記事の投稿先指定):

fj.sources.d

Subject(見出し):

isi.rb version 0.8 (formerly named isi2bibtex.rb)

Date(投稿日時):

13 Apr 2005 22:12:57 +0900

Organization(所属):

Tohoku Univ InterNetNews Site

Message-ID(記事識別符号):

(G) <yeymzs27rc6.fsf@pentium3dual.imr.tohoku.ac.jp>

Followuped-by(子記事):

(G) <3991786news.pl@rananim.ie.u-ryukyu.ac.jp>

記事全体へのコマンド

西松と申します.

以前投稿しましたisi2bibtex.rbを改良, ライブラリ化して
isi.rb としてリリースします. isi.rb はISI社の巨大な
学術論文データベースのWeb of Scienceのタグのついた
出力ファイルをBibTeX形式に変換するRubyスクリプトです.
例えば
 % ruby isi.rb savedrecs.txt
 % ruby isi.rb savedrecs1.txt savedrecs2.txt > savedrecs.bib
 % ruby isi.rb < savedrecs.txt > savedrecs.bib
 % cat savedrecs.txt | ./isi.rb > savedrecs.bib
などと使います.

おまけとして「東北大学情報データベースシステム」の
「論文一括登録ファイル」形式に対応しています. これを
参考に, isi.rb をライブラリとして require して, 近年
大流行の個人業績評価のためのデータベースのデータ入力
などにも役立てて下さい. （Rubyすごいです. このおまけ
が２時間強でできました.）こんな具合です:
 % ruby -r isi.rb -e 'while rec=ARGF.read_an_ISI_record; print rec.to_tohoku_DB(12345678,"NISHIMATSU Takeshi"); end' my2003.isi my2004.isi

なお, Austin Zieglerさん作のテキスト整形ライブラリ
Text::Format <http://www.halostatue.ca/ruby/Text__Format.html>
が必要です.

   love && peace && free_software
   西松タケシ ％ 論文の「数」を増やさないとクビになりそうな人

ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー
#!/usr/bin/env ruby
=begin
= isi.rb - convert ISI Export Format to BibTeX Format. (Formerly named isi2bibtex.rb.)

== What is isi.rb?
isi.rb converts ISI Export Format to BibTeX Format.
This is a Ruby script. You can use this script as a library.

You can get the tagged Marked List in Web of Science by pushing the
[SAVE TO FILE] button.

== Copying
isi.rb is distributed in the hope that
it will be useful, but WITHOUT ANY WARRANTY.
You can copy, modify and redistribute isi.rb,
but only under the conditions described in
the GNU General Public License (the "GPL").

== Who is the author?
NISHIMATSU Takeshi <t-nissie{at}imr.tohoku.ac.jp>

== Why did he write it?
Because he do not like the output format of the Perl version.

== Is there a Perl version?
Yes.
You can find the Perl version by Jonathan Swinton, Ben Bolker, Anthony Stone, John J. Lee
((<"in CTAN"|URL:http://ring.tains.tohoku.ac.jp/archives/text/CTAN/biblio/bibtex/utils/isi/>)).

== Where can I get isi.rb?
Please download isi.rb from:
((<URL:http://www-lab.imr.tohoku.ac.jp/%7Et-nissie/computer/software/isi/isi.rb>))

== How can I install it?
(1) Install ((<"Ruby"|URL:http://www.ruby-lang.org/>))
(2) Install ((<"Text::Format"|URL:http://www.halostatue.ca/ruby/Text__Format.html>))
(3) Put isi.rb into somewhere.

== How can I use it?
(1) Mark the articles in ISI Web of Science.
(2) View and save the marked records to an output file (savedrecs.txt).
    I recommend to check "Author(s)", "Title", "Source", "abstract*",
    "abstract", "keywords" and "source abbreviation" as the fields to
    include in the output file.
(3) Then, here are some examples:
 % ruby isi.rb savedrecs.txt
 % ruby isi.rb savedrecs1.txt savedrecs2.txt > savedrecs.bib
 % ruby isi.rb < savedrecs.txt > savedrecs.bib
 % cat savedrecs.txt | ./isi.rb > savedrecs.bib
 % ruby -r isi.rb -e 'while rec = ARGF.read_an_ISI_record; print rec.to_tohoku_DB(12345678,"foo"); end' my2003.isi my2004.isi

== I do not like the output format of isi.rb, neither!
The output format is defined in the source code WYSIWYGly.
So you can change by yourself easily.

== ChangeLog
=== 2005-04-13
* ISI_record#to_tohoku_DB(id, name)
* isi.rb version 0.8 is announced at fj.sources, fj.comp.lang.ruby and fj.comp.texhax.

=== 2005-03-30
* tag LA
* isi2bibtex.rb version 0.7

=== 2004-06-18
* ISI_record#fmt()
* isi2bibtex.rb version 0.6 is released!

=== 2004-06-17
* require ((<"Text::Format"|URL:http://www.halostatue.ca/ruby/Text__Format.html>))
  by Austin Ziegler.
* Title, Keywords, NewKeywords, and Abstract are nicely formated into fixed-width.
* isi2bibtex.rb version 0.5 is released!

=== 2004-06-09
* simplified.
* tags are sorted.
* Reports "Filename:LineNumber: ..." when unknown tags are found.
* isi2bibtex.rb version 0.4

=== 2004-06-07
* Format of ref_name is changed to author:[authors:]journal:volume:page:year.
* Names of authors such like "de Haas, WJ" and "van Alphen, PM" are now available.
* AR tag (article number of new APS journals) is now available.
* It is O.K. in the case of "BP art. no., EP 125111".
* isi2bibtex.rb version 0.3 is released!

=== 2002-06-28
* isi2bibtex.rb version 0.2 is released!

== Meanings of tags in ISI Export Format:
See ((<URL:http://isibasic.com/help/helpprn.html>))
=== file-unique tags
 FN: File type. The file starts with 'FN ISI Export Format'
 VR: Version number of ISI export file format
 EF: End of file
=== normal tags
 AB: Abstract
 AR: Article number of new APS journals
 AU: Authors
 BP: Beginning page
 C1: Research addresses
 CR: Cited references
 DE: Original keywords
 DT: Document type
 EP: Ending page
 ER: end of a record
 GA: ISI document delivery number
 ID: New keywords given by ISI
 IS: issue
 J9: 29-character journal title abbreviation
 JI: ISO journal title abbreviation
 LA: Language
 NR: Cited reference count
 PD: Publication date e.g. "JUN 8" or "JUL"
 PG: the number of pages
 PI: Publisher city
 PN: Part number
 PT: Publication type (e.g., book, journal, book in series)
 PU: Publisher
 PY: Publication year
 RP: Reprint address
 SE: Book series title
 SI: Special issue
 SN: ISSN
 SO: journal title, in full
 SU: Supplement
 TC: Times cited
 TI: Title
 UT: ISI unique article identifier
 VL: Volume
 WP: Publisher web address

== Known bugs
* none.

== TODO
* Write papers, not tools for writing papers.

== References
* ((<The BibTeX Format|URL:http://www.ecst.csuchico.edu/~jacobsd/bib/formats/bibtex.html>))
* ((<Bibliography (BibTeX) Tools|URL:http://www.ecst.csuchico.edu/~jacobsd/bib/tools/bibtex.html>))

== Thanks to contributer(s)!
* Marcin Dulak
=end
ISI_RB_VERSION = "0.8"
require 'text/format'
class ISI_record
public
  @@order = 0
  def initialize(hash)
    @hash = hash
    @@order += 1
  end

  def to_tohoku_DB(id, name)
    "#{id}\t"              + # A ID
    "#{name}\t"            + # B Name
    "#{@hash['TI']}\t"     + # C Title
    "#{@hash['TI']}\t"     + # D Title in English
    "01\t"                 + # E Language
    "1\t"                  + # F The number of Author(s) 1:1, 2:not 1
    "1\t"                  + # G Kind - 1:regular paper
    "1\t"                  + # H Refereep - 0:nil 1:t
    "0\t"                  + # I Invitedp - 0:nil 1:t
    "Greatly\t"            + # J Contribution
    "#{@hash['JI']}\t"     + # K Journal
    "#{@hash['JI']}\t"     + # L Journal in English
    "#{@hash['VL']}\t"     + # M Vol.
    "#{@hash['IS']}\t"     + # N No.
    "#{@hash['AR'] or @hash['BP']}\t" +
    "\t"                   + # P Page END
    "#{@hash['PY']}\t"     + # Q Year
    "#{month_in_number}\t" + # R Month
    "\t"                   + # S Date
    "#{and_separated_authors.gsub(" and ",", ")}\t" +
    "#{and_separated_authors.gsub(" and ",", ")}\t" +
    "\t"                   + # V URL of online journal
    "\t"                   + # W Other
    "#{@hash['PY']-2000}#{sprintf("%.2d",@@order)}\t" +
    "1\n"
  end

  def to_bibtex
    return nil if self==nil
    "@ARTICLE{#{ref_name},
        Author     = {#{and_separated_authors}},
        Title      = {#{fmt('TI')}},
        Journal    = {#{@hash['JI']}},
        JournalFull= {#{@hash['SO']}},
        Year       = {#{@hash['PY']}},
        Month      = {#{month}},
        Volume     = {#{@hash['VL']}},
        Number     = {#{@hash['IS']}},
        Pages      = {#{pages}},
        Keywords   = {#{fmt('DE')}},
        NewKeywords= {#{fmt('ID')}},
        Abstract   = {#{fmt('AB')}},
        URL        = {},
        MyComment  = {},   
        WhereIFiledIt= {}}\n\n"
  end

private
  FMT = Text::Format.new(:columns => 80, :first_indent => 1, :left_margin => 22)
  def fmt(tag)
    FMT.paragraphs(@hash[tag]).sub(/                       /,'')
  end            # This Srting#sub is to discard an indent at the first line.

  def pages
    if @hash['AR']
      return @hash['AR']   # article number of new APS journals
    elsif @hash['BP'] =~ /^art/
      return @hash['EP']   # in the case of "BP art. no., EP 125111"
    elsif @hash['EP'] =~ /^\w?\d+/
      return @hash['BP'] + '-' + @hash['EP']   # in the cases of "EP 1234" or "EP L567"
    else
      return @hash['BP']
    end
  end

  def month
    if @hash['PD'] =~ /^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/i
      return $1.upcase
    else
      return ''
    end
  end

  def month_in_number
    if @hash['PD']
      mon = @hash['PD'][0..2].upcase
      ary = ["Dmy", "JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP", "OCT", "NOV", "DEC"]
      return ary.index(mon)
    else
      return 1
    end
  end

  def and_separated_authors
    au = ''
    @hash['AU'].each_with_index do |name,i|
        au += ' and ' if i>0
      if name =~ /, /
        family_name = $`
        initials = $'
        au += initials.scan(/\w/).join(". ") + '. '  # "ABC" -> "A. B. C. "
        au += family_name
      else
        au += name
      end
    end
    return au
  end

  def ref_name
    rn = ''
    @hash['AU'].each_with_index do |name,i| 
      if i==0
        if name =~ /, /
          rn << $` << ':'       # Take the component before /, /
        else
          rn << name << ':'     # Take the whole name
        end
      else
        rn << name[0,1] << ':'  # Take the first character of the name
      end
    end
    rn << @hash['JI'].to_s << ':'<< @hash['VL'].to_s << ':p'<< pages << ':'<< @hash['PY'].to_s
    return rn.gsub(/\. */,'').gsub(/ +/,'').sub(/PhysRev(A|B|C|D|E)/,'PR\1').sub(/PhysRevLett/,'PRL')
  end
end

class Object
  def read_an_ISI_record   # for ARGF
    hash = {}
    while line = gets
      #===== a few special cases
      return ISI_record.new(hash) if line =~ /^ER/
      next if line =~ /^(EF|FN|VR)/   # ignore file-unique tags
      next if line =~ /^\s*$/         # ignore blank lines
      while line =~ /\!$/
        line.chomp!.chop!   # continued to next line if the line ends with "!"
        line << gets
      end
      #===== Normal tags
      case line
      when /^AU /
        authors = [line.chomp.sub(/^AU /,'')]
        while (line = gets) =~ /^   /
          authors.push(line.chomp.sub(/^   /,''))
        end
        hash['AU'] = authors
        redo
      when /^(TC|PG|PY) (.*)$/
        hash[$1] = $2.to_i
      when /^(TI|AB|DE|ID|SO|PT|JI|BP|EP|AR|PD|VL|IS|GA|PI|PU|PN|PA|J9|UT|DT|C1|RP|SI|SE|SU|LA) (.*)$/
        tag = $1
        str = $2
        while (line = gets) =~ /^   /
          str << line.chomp.sub(/^  /,'')
        end
        hash[tag] = str
        redo
      else
        STDERR << "#{$FILENAME}:#{file.lineno}: Unknown tag: #{line}"
      end
    end
    return nil
  end
end

if $0 == __FILE__
  # MAIN LOOP
  while rec = ARGF.read_an_ISI_record
    print rec.to_bibtex
  end
end

Fnews-brouse 1.9(20180406) -- by Mizuno, MWE <mwe@ccsf.jp>
GnuPG Key ID = ECC8A735
GnuPG Key fingerprint = 9BE6 B9E9 55A5 A499 CD51 946E 9BDC 7870 ECC8 A735