Perl Unicode 美食大全：Unicode 排序和打印演示

2012年6月22日，作者：Tom Christiansen

℞ 44：程序：Unicode 排序和打印演示

过去几周的 Unicode 美食大全是解释 Unicode 如何工作以及如何在程序中使用它。如果你已经看过这些美食大全，你现在比大多数程序员了解得更多。

把所有东西放在一起怎么样？

这里有一个完整的程序，展示如何使用与位置相关的排序、Unicode 转换大小写以及管理打印宽度，当一些字符占用零列或两列而不是每次一列时。运行以下程序将产生如下整齐对齐的输出（当然，对齐质量取决于你 Unicode 字体的质量）

    Crème Brûlée....... €2.00
    Éclair............. €1.60
    Fideuà............. €4.20
    Hamburger.......... €6.00
    Jamón Serrano...... €4.45
    Linguiça........... €7.00
    Pâté............... €4.15
    Pears.............. €2.00
    Pêches............. €2.25
    Smørbrød........... €5.75
    Spätzle............ €5.50
    Xoriço............. €3.00
    Γύρος.............. €6.50
    막걸리............. €4.00
    おもち............. €2.65
    お好み焼き......... €8.00
    シュークリーム..... €1.85
    寿司............... €9.99
    包子............... €7.50

这就是那个程序；在 v5.14 上进行了测试。

 #!/usr/bin/env perl
 # umenu - demo sorting and printing of Unicode food
 #
 # (obligatory and increasingly long preamble)
 #
 use utf8;
 use v5.14;                       # for locale sorting and unicode_strings
 use strict;
 use warnings;
 use warnings  qw(FATAL utf8);    # fatalize encoding faults
 use open      qw(:std :utf8);    # undeclared streams in UTF-8
 use charnames qw(:full :short);  # unneeded in v5.16

 # std modules
 use Unicode::Normalize;          # std perl distro as of v5.8
 use List::Util qw(max);          # std perl distro as of v5.10
 use Unicode::Collate::Locale;    # std perl distro as of v5.14

 # cpan modules
 use Unicode::GCString;           # from CPAN

 # forward defs
 sub pad($$$);
 sub colwidth(_);
 sub entitle(_);

 my %price = (
     "γύρος"             => 6.50, # gyros, Greek
     "pears"             => 2.00, # like um, pears
     "linguiça"          => 7.00, # spicy sausage, Portuguese
     "xoriço"            => 3.00, # chorizo sausage, Catalan
     "hamburger"         => 6.00, # burgermeister meisterburger
     "éclair"            => 1.60, # dessert, French
     "smørbrød"          => 5.75, # sandwiches, Norwegian
     "spätzle"           => 5.50, # Bayerisch noodles, little sparrows
     "包子"              => 7.50, # bao1 zi5, steamed pork buns, Mandarin
     "jamón serrano"     => 4.45, # country ham, Spanish
     "pêches"            => 2.25, # peaches, French
     "シュークリーム"    => 1.85, # cream-filled pastry like éclair, Japanese
     "막걸리"            => 4.00, # makgeolli, Korean rice wine
     "寿司"              => 9.99, # sushi, Japanese
     "おもち"            => 2.65, # omochi, rice cakes, Japanese
     "crème brûlée"      => 2.00, # tasty broiled cream, French
     "fideuà"            => 4.20, # more noodles, Valencian (Catalan=fideuada)
     "pâté"              => 4.15, # gooseliver paste, French
     "お好み焼き"        => 8.00, # okonomiyaki, Japanese
 );

 # find the widest allowed width for the name column
 my $width = 5 + max map { colwidth } keys %price;

 # So the Asian stuff comes out in an order that someone
 # who reads those scripts won't freak out over; the
 # CJK stuff will be in JIS X 0208 order that way.
 my $coll  = Unicode::Collate::Locale->new( locale => "ja" );

 for my $item ($coll->sort(keys %price)) {
     print pad(entitle($item), $width, ".");
     printf " €%.2f\n", $price{$item};
 }

 sub pad($$$) {
     my($str, $width, $padchar) = @_;
     return $str . ($padchar x ($width - colwidth($str)));
 }

 sub colwidth(_) {
     my($str) = @_;
     return Unicode::GCString->new($str)->columns;
 }

 sub entitle(_) {
     my($str) = @_;
     $str     =~ s{ (?=\pL)(\S)     (\S*) }
              { ucfirst($1) . lc($2)  }xge;
     return $str;
 }

很简单，不是吗？放在一起，一切都能很好地工作。

上一篇文章：℞ 43：DBM 文件中的 Unicode 文本（简单方法）

系列索引：标准序言

下一篇文章：℞ 45：更多资源

标签

unicode