Complex Pattern Finding i Large Data Sets
In the later part of my PhD work I found that a bayesian
recurrent neural network (a network using Bayesian logic
and a frequentist approach to set the weights)
which was originally invented
by
Anders Lansner
and
Örjan Ekeberg
in the SANS group
and has been improved by,
Anders Holst
and Anders Sandberg
is an
excellent tool to perform data mining in large data sets.
To find patterns tremendously quick, due to the fact that
it is working on global statistics.
A method which has been succesfully
applied to find syndromes in the
WHO database.
This is published in the thesis I will soon provide a
link to here, but due to the copyright rules and that
the paper is in the publishing process I am not
allowed, according the rules of today, to provide
the actual paper.
The invention which I will present at this site before
April 23rd 2004 builds upon this method which can be
seen as applying AI in Medicine (AIM).
I have extended this with
another type of rule based reasoning into a method which
will be applied to two other areas (which may also be abbreviated AIM).
Two areas were researchers and geek thinking people
are frustrated today, but where this invention will help
to cure the reason for these symptoms which makes
us frustrated.
Problems: (almost similar to the early warning problem)
-
Even if the method is great it is unfortunately not
possible to publish on the web due to the copyright
rules.
-
In this kind of publications there is a tremendous
time lag. The paper was originally submitted in
Jan 2003, returned May 2003, resubmitted
in Aug 2003, returned again for fixes in Feb 2003.
-
When a paper is submitted, it is "locked in" until
the reviers of the journal has told their view, in
this process several years can easily pass by.
-
The paper is submitted to a journal specializing
in neural networks, but this is quite a narrow
area so the paper will most likely not be read
by most researchers and applications
which could make use of this technique.
-
Even though research results shold be free for all
they are in practice locked in.
-
It is not possible to store this kind of knowledge
in a big data base and for instance perform data mining
on, due to the "locking in". To purchase all papers
that would be needed would cost a tremendous amount.
-
Formats for this kind of information exchange are not
standardized which makes it very hard to do anything
meaningful of them. The formats which are used
today are almost completely lacking structure. I use
TeX, some use Word, but these are both almost
useless for information retriveal viewpoint.
Once SGML was used which allowed HTML to be created.
HTML is just another unstructured mess, only usable
for final presentation. HTML is not an information
preserving format. XML is a subset
of SGML but XML is less general than SGML.
HTML is a document type in SGML but HTML can
not be specified using XML for instance (XHTML can
however due to the strict requirements of end tags
in XHTML).
-
Why, with improved technology, should we
go backwards?
-
If I write a similar paper and did not send it to the
publisher, I could then publish it on the web, but how
could you then know that what I've written wouldn't
be crap, as it would not be peer reviewed.
aim
nonutopia.org.
Last modified: Tue Mar 22 03:43:09 CEST 2002