Topic Models - find out hidden content in your documents

  • Drucken

Dirichlet Distribution

Topic Models - find out hidden content in your documents (Finden Sie Inhalte und Zusammenhänge in Ihren Dokumenten, von deren Existenz Sie bisher nichts wussten

 

 

 

English

Deutsch

Instead of clustering documents and determine, which content predominates in which cluster, you can also go the other way round.

As a first step, topics of the whole document collection are identified. Then - as a second step - each document of the collection is related to one or more topics. Exactly this is also the main difference between these algorithms.

In the first case, a certain document is attributed to just one specific cluster. In the second case, a document is considered to consist of several different parts, and each of them can belong to a different topic.

The following example show the result of an analysis where about 450 patents are gouped into 25 topics consisting of 20 keywords each ...

Clicking on a topic one gets a list of documents belonging to that topic, sorted downward with respect to the words within the document (of that topic).

Anstatt Dokumente in Cluster zu gruppieren, und anschließend zu ermitteln, welche Inhalte in welchen Clustern dominieren, kann auch ein anderer Weg beschritten werden.

Man ermittelt, welche Themenkomplexe in der gesamten Dokumentkollektion vorhanden sind, und ordnet dann die Dokumente der Kollektion den Themen zu. Obwohl auf den ersten Blick kein gravierender Unterschied zum zuerst genannten Verfahren besteht, ist dieser Unterschied jedoch serwohl vorhanden.

Bei der Zuordnung von Dokumenten zu Clustern wird davon ausgegangen, dass ein bestimmtes Dokument eindeutig einem bestimmten Cluster zugewiesen werden kann. Im Gegensatz dazu ist das umgekehrte Verfahren in der Lage, ein Dokument als aus mehreren Abschnitten bestehend aufzufassen, die ihrerseits zu unterschiedlichen Themenkomplexen gehören können.

Das nachfolgende Beispiel zeigt das Ergebnis einer Analyse, bei der ca. 450 Patente in 25 Themenkomplexe zerlegt worden sind, wobei jedes Thema durch max. 20 Stichworte repräsentiert wird.

Klickt man ein Thema an, erhält man eine Liste der zugehörigen Dokumente, absteigend sortiert nach der Anzahl der Worte des Themas, die im Dokument enthalten sind.

 

List of Topics / Themenliste

1.particles size volume particle microns dispersion weight parts casting viscosity surface cm density average thick min distribution inch maximum sheet
2.mold surface lens method material fig member optical cavity surfaces glass form molded area fibers polymer blank silicon body lenses
3.fig end material means member assembly cast portion surface outer upper wall invention body apparatus view shown lower support side
4.reaction invention product anhydride process solvent copolymer carbon formula copolymers reactant polymerization atoms polymers free maleic radical vinyl units dioxolane
5.component percent liquid mixture binder weight polymer cement composition water temperature material compositions silica free strength comprises process reactive solid
6.weight acid methacrylate composition monomer acrylate methyl vinyl compositions preferably monomers organic resin groups glycol polymer copolymer examples parts alkyl
7.polymer acid solution poly conductive electrode film substrate invention claim oxide present form method thin conductivity solvent gas metal electrically
8.layer metal ceramic alloy core tape pattern elements material green method invention casting transducer alloys slip process step article form
9.filler composite particles material matrix film casting composition liquid invention method substrate present suitable particulate viscosity layer filled fluoropolymer al
10.film casting solution solvent polymer layer process surface percent invention thickness films weight preferably substrate support composite liquid formed claim
11.coating coated paper polymer surface fiber material adhesive cast pipe layer method applied aluminum side gloss invention plastic roll pigment
12.casting metal powder composition polymer invention zinc coating parts mold boron water oxide weight nitride agent steel surface peroxide resin
13.polymer graft blend weight melt copolymer polymers polypropylene ethylene film polyethylene blends table molecular extrusion high low copolymers poly density
14.polymer str liquid polymers ml group hours weight groups reaction formula ch temperature cf process prepared film fluorine compound form
15.polymer weight drug release screw nut molecular lead method casting amount adhesive composition controlled claim system clay high threads cavity
16.polymer acrylic cast composition monomer polymerization preferably temperature fabric resin acrylate material heat time minutes method initiator present reaction amount
17.membrane membranes polymer solvent separation process temperature gas preferably mixture solution porous water pore poly weight flux invention casting cast
18.invention present polymer fig high cast low surface rubber art liquid shown materials provide accordance effect light improved diameter conventional
19.water solution film polyvinyl soluble cellulose alcohol added aqueous acetate polymer weight acid emulsion starch films reaction mixture cast vinyl
20.membrane polymer solution water surface resin hydrophilic groups casting invention filtration claim matrix acid structure ml method modifying modified system
21.sheet temperature roll heat side casting mm melt obtained cooling cathode cell surface rate process crystallization degree high treatment thickness
22.acid aromatic grams invention amine solution mole bis mixture chloride molecular compounds added prepared formula polyimide str ester temperature reaction
23.polymerization plate monomer methacrylate weight resin methyl surface zone syrup mm parts synthetic process water obtained belts group polymer table
24.parts polyurethane diisocyanate reaction isocyanate urethane prepolymer groups glass mixture added molecular diol invention radiation temperature hydroxyl glycol polyol polyether
25.product process cross resin weight invention catalyst material claim products polymer pat set linking plastic number making linked present polyester