[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
‘Shimbun’ is a library set of emacs-w3m that enables you to read certain web contents using Gnus, Wanderlust, or Mew as if they were email messages. Here we will explain how to make a new ‘shimbun’ module.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you make a new ‘shimbun’ module ‘foobar’ for reading contents of http://www.foobar.net, what you have to do first is to put the following S expressions in the first part of the ‘sb-foobar.el’ file:
(require 'shimbun) (luna-define-class shimbun-foobar (shimbun) ()) |
We will explain what they are below, so you can understand they are just
incantations now. You have to use the same suffix ‘foobar’ in the
file name (‘sb-foobar.el’) and the class name
(‘shimbun-foobar’) as the second argument for the
luna-define-class
macro.
Major jobs of the ‘shimbun-foobar’ module can be classified broadly into the following four categories (note that you may rephrase “folder” with “group” if you are a Gnus user):
headers
.
shimbun-headers
of ‘shimbun.el’ does the first job,
shimbun-get-headers
does the second, shimbun-article
does
the third and shimbun-make-contents
does the last.
The shimbun-headers
method does the first job, the
shimbun-get-headers
method does the second, the
shimbun-article
method does the third and the
shimbun-make-contents
method does the last thing. The default
methods for those categories are defined in the ‘shimbun.el’
module.
Open the ‘shimbun.el’ file. You may see unfamiliar definitions
like luna-define-generic
or luna-define-method
there. Hm,
they look like defun
, don’t you? You may also see there’s just a
doc-string in the former definition and the same symbol is declared
again in the later form. And further, there are some symbols only
declared by the luna-define-generic
form, not by the
luna-define-method
form. What on earth are we seeing? Isn’t the
program not written in the Emacs-Lisp language?
The truth is that the ‘shimbun’ modules use the ‘luna.el’ module provided by FLIM which enables you to write object oriented programs in the Emacs-Lisp language.
There are method programs defined rigidly for the specific purposes in
the ‘shimbun.el’ module. The shimbun-headers
method gets a
page source from a certain URL, the shimbun-get-headers
method
gathers subjects and other informations, etc… (see above). They
do routine works, so they cannot take proper method to meet various web
contents in the world. Eh? Oh, you shouldn’t believe in a heresy!
The ‘shimbun.el’ module only provides the default method functions.
Remember the defadvice
feature. There are three ways to modify
the behavior of a function: :before
, :around
and
:after
. Similarly, each default ‘shimbun’ method function
can be modified for a certain purpose (note that the :around
method-qualifier can be omitted). And it should be written specially
that the modification will be effective only when the specified
‘shimbun’ module is selected.
Now as you may have understood that the luna-define-generic
form
provides only a husk in a sense, the luna-define-method
form
defines an actual function which can be different for each
‘shimbun’ module, and the luna-define-class
form declares
the ‘shimbun’ class in the first part of the ‘sb-foobar.el’
module.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Let’s identify a target web page URL to gather subjects and other
informations first. If a web site uses a frame, a target is only one
of the web pages. Second, lets create a body of the
shimbun-index-url
method function using the
luna-define-method
form in your ‘sb-foobar.el’ file. And
make the user customizable variable shimbun-foobar-groups
, which
we will explain later(11).
(defvar shimbun-foobar-url "http://www.foobar.net") (luna-define-method shimbun-index-url ((shimbun shimbun-foobar)) shimbun-foobar-url) (defvar shimbun-foobar-groups '("news")) |
After you create a body of the shimbun-index-url
method, the
shimbun-headers
method can get a web page source since the
‘shimbun.el’ module already has the default shimbun-headers
method. After the shimbun-headers
method gets a web page source,
it calls the shimbun-get-headers
method to gather headers
information. As the ‘shimbun.el’ module does not have the
shimbun-get-headers
method, you have to create it in your
‘sb-foobar.el’ file.
Now look carefully in the page source and create the
shimbun-get-headers
method in your ‘sb-foobar.el’ file.
Create a regular expression that can gather headers information.
Minimally necessary information are subject, date, author, URL and
message-id
of an article. They are used in MUA as Subject, Date,
From, Xref and Message-ID.
If you want to make an article from a line in a web page source, like:
<a href="053003.html">some talks on May 30(posted by Mikio <foo@bar.net>)</a> |
use the following regexp:
"<a href=\"\\(\\([0-9][0-9][0-9][0-9]\\)[0-9][0-9]\\.html\\)\">\\([^<(]+\\)(posted by \\([^<]+\\))<\/a>" |
You can get a value for Xref by
(match-string 1)
. You can get a value for Date by modifying
a value of
(match-string 2)
. Subject by
(match-string 3)
and From from
(match-string 4)
. You can modify them further for showing
additional information in MUA.
If URL of an article is a relative path like above, use
shimbun-expand-url
to expand it before putting information to
header. If each article doesn’t have a each unique URLs (i.e. URL of
headers and URL of articles are just same), you have to ask Emacs to
remember body of an article when gathering headers information, For more
detail see the files ‘sb-palmfan.el’, ‘sb-dennou.el’ and
‘sb-tcup.el’.
Sometimes you cannot identify Date information when gathering headers
information only from a web page source. If so, leave it, just set a
null string, ""
to its value. If you can identify Date only when
you see contents of an article, you can set it at that time by using
shimbun-make-contents
method. And you may use a fixed From for a
web site (e.x. "webmaster@foobar.net").
Be careful when you build a message-id. Make sure it has uniqueness otherwise you may not be able to read some articles in the ‘shimbun’(12). Assure uniqueness by building message-id using date information, a domain of the page and/or a part of URL of the page. And use ‘@’ but ‘:’ as a part of message-id in order to display inline images. See RFC2387 and RFC822 for more detail.
Put these information to header using function
shimbun-create-header
of the ‘shimbun.el’ module.
A bare bone of shimbun-get-headers
in your ‘sb-foobar.el’
file is as follows:
(luna-define-method shimbun-get-headers ((shimbun shimbun-foobar) &optional range) (let ((regexp "....") subject from date id url headers) ... (catch 'stop (while (re-search-forward regexp nil t nil) ... (when (shimbun-search-id shimbun id) (throw 'stop nil)) (push (shimbun-create-header 0 subject from date id "" 0 0 url) headers))) headers)) |
Note that you can access ‘shimbun-foobar’ instance via temporary
variable shimbun
in the method.
Now we will explain a user variable shimbun-foobar-groups
.
Assume that you have two groups of articles in http://www.foobar.net and there are two different web pages for such groups in where ‘shimbun’ module gathers header information. For examples, there are what’s new information of the web site in http://www.foobar.net/whatsnew/index.hmtl, and there are archive lists of email messages posted to ML in http://www.foobar.net/ml/index.html. In such case you may want to access the group by ‘shimbun’ folders ‘foobar.whatsnew’ and ‘foobar.ml’. If so, put the following S expressions to the ‘sb-foobar.el’ file.
(defvar shimbun-foobar-url "http://www.foobar.net") (defvar shimbun-foobar-group-path-alist '(("whatsnew" . "/whatsnew/index.html") ("ml" . "/ml/index.html"))) (defvar shimbun-foobar-groups (mapcar 'car shimbun-foobar-group-path-alist)) (luna-define-method shimbun-index-url ((shimbun shimbun-foobar)) (concat shimbun-foobar-url (cdr (assoc (shimbun-current-group-internal shimbun) shimbun-foobar-group-path-alist)))) |
You can get the current group by using
shimbun-current-group-internal
. You can use it in
shimbun-get-headers
method (or others) in order to change its
behavior in accordance with the current group.
Each ‘shimbun’ module needs at least one group. There is not a special rule for naming a group, but if you don’t find out a good name, use ‘news’ or ‘main’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
shimbun-article
method defined in the ‘shimbun.el’ module
gets URL from Xref information of header, get a web page source from the
URL, and call shimbun-make-contents
in working buffer of the
source. Major job of shimbun-make-contents
is to process such
HTML. Imagine that a working buffer has a web page source of an
article. shimbun-make-contents
defined in the ‘shimbun.el’
module insert (i) header information to top of the buffer, (ii)
‘<html>’, ‘<body>’ and etc. right after the information, and
(iii) ‘</body>’ and ‘</html>’ to end of the buffer. MUA
displays an article as a HTML mail.
Not only HTML articles, but also articles in the ‘text/plain’ format can be generated. See section Making text/plain articles.
If you don’t want to process an article, you don’t have to define
shimbun-make-contents
in the ‘sb-foobar.el’ module.
If you want to remove some part of a web page source of an article at
its top and its end, set regexp to shimbun-foobar-content-start
that matches content start and shimbun-foobar-content-end
that
matches content end.
(defvar shimbun-foobar-content-start "^<body>$") (defvar shimbun-foobar-content-end "^<\/body>$") |
shimbun-clear-contents
, which is called by
shimbun-make-contents
defined in the ‘shimbun.el’ module,
will remove HTML source from point-min
to
shimbun-foobar-content-start
and from
shimbun-foobar-content-end
to point-max
using the regexps.
Note that it will not remove any HTML source when either of the regexp
searches fails.
If you want to remove more unnecessary parts (e.x. advertisements)
diligently, define shimbun-clear-contents
in your new
‘sb-foobar.el’ file as follows:
(luna-define-method shimbun-clear-contents :around ((shimbun shimbun-foobar) header) ;; cleaning up (while (re-search-forward "..." nil t nil) (delete-region (match-beginning 0) (match-end 0))) (luna-call-next-method)) |
For more details see shimbun-make-contents
in the
‘sb-ibm-dev.el’ file.
I said in the subsection of Getting web page and header information that if each article doesn’t have a each unique URLs you have to ask Emacs to remember body of an article when gathering headers information, In such case you don’t have to get a web page from URL of Xref in ‘shimbun-article’ method. Just get texts from Emacs memories and put them with pretty printing. For more detail see definitions of ‘shimbun-article’ method of ‘sb-palmfan.el’, ‘sb-dennou.el’ or ‘sb-tcup.el’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are some famous mailing list manager (or archiver).
If you find out one of such mailing list managers’ names in a web page
source when you analyze it in the step of See section Getting web page and header information, you are very lucky(13). The
modules ‘sb-mailman.el’, ‘sb-mhonarc.el’, ‘sb-fml.el’ and
‘sb-mailarc.el’ have the shimbun-get-headers
method, etc,
already, when you write small code that is not defined in such
‘shimbun’ modules, your new ‘sb-foobar.el’ module works!
If you use the ‘sb-mailman.el’ module, write the following S expressions to the top of the ‘sb-foobar.el’ file:
(require 'sb-mailman) (luna-define-class shimbun-foobar (shimbun-mailman) ()) |
Those above mean that ‘shimbun’ module ‘shimbun-foobar’ inherits shimbun-mailman class(14) and methods defined in the ‘sb-mailman.el’ module will be used in ‘shimbun-foobar’ by default. You can overwrite some of parent methods, if necessary.
See the ‘sb-pilot-mailsync.el’ file as a sample that uses the ‘sb-mailman.el’ module. You can feel how easy to create a new ‘shimbun’ module by using such parent modules.
Note that there are some localized version of such mailing list manager, for examples, some of them show Date information in Japanese. The modules ‘sb-mailman.el’, ‘sb-mhonarc.el’, ‘sb-fml.el’ and ‘sb-mailarc.el’ assumes that mailing list managers are not localized.
If you want to read via ‘shimbun’ a web site that uses localized mailing list manager, you may have to overwrite some methods in the parent module.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Even if the MUA is reinforced by emacs-w3m so as to be able to read HTML articles, ‘text/plain’ articles might be more convenient in some cases. To make the ‘sb-foobar’ module generate ‘text/plain’ articles rather than ‘text/html’ articles, there are two ways to do that.
(require 'sb-text) (luna-define-class shimbun-foobar (shimbun-text) ()) |
The ‘sb-text’ module provides the shimbun-make-contents
method which generates the articles in the ‘text/plain’ format.
This will be useful for the ‘shimbun’ modules handling the web
sites which put up only text articles.
shimbun-foobar-prefer-text-plain
variable
to non-nil
. This makes the shimbun-make-contents
method
generate the articles in the ‘text/plain’ format (actually, it uses
the functions provided by the ‘sb-text’ module). Note that this is
effective only to the modules which inherit the default
shimbun-make-contents
method (especially the modules which
inherit the ‘sb-text’ module are not affected). The advantage of
this way is that users can easily switch ‘text/plain’ articles and
‘text/html’ articles.
The default value for the shimbun-foobar-prefer-text-plain
variable is nil
if it is not defined. So, it defaults to
nil
in every ‘shimbun’ module except for the modules
‘sb-asahi.el’ and ‘sb-yomiuri.el’.
In addition, you can use the variables
shimbun-foobar-text-content-start
and
shimbun-foobar-text-content-end
instead of
shimbun-foobar-content-start
and
shimbun-foobar-content-end
to extract significant text in web
pages (see section Displaying an article). If the formers are not defined,
those values default to the latter values.
Whichever the ways you use, you should note that the ‘text/plain’ articles cannot contain images, links, etc.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
“Zenkaku” or “zenkaku character(s)” is a term commonly used to call Japanese wide characters, and “hankaku” is an opposite term for ordinary ASCII characters. There is a complete set of zenkaku characters corresponding to at least the ASCII character set.
Some Japanese web sites tend to use zenkaku characters a lot, and those
articles might not necessarily be comfortable to read. If you feel so,
you can use this feature that converts those zenkaku ASCII characters
into hankaku. To do that, set the shimbun-foobar-japanese-hankaku
variable to t
. Where foobar
is a server name to which you
subscribe for shimbun articles. That is, you have to use it per server.
If you prefer to convert zenkaku to hankaku only in the body of
articles, use the value body
instead of t
. Contrarily the
value header
or subject
specifies to perform it only in
subjects.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sb-ant.el | sb-html.el | sb-info.el | sb-texinfo.el |
sb-gud.el | sb-image.el | sb-rmail.el | sb-w3.el |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by TSUCHIYA Masatoshi on January 30, 2019 using texi2html 1.82.