Skip to content

Runs - Blog 2009

BIT09P

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: BIT09P
  • Participant: BIT
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: feed
  • MD5: 5f68c75a879601f7830e97eaab34de69
  • Run description: BIT09P is the same as BIT09PH, except it only uses permalinks indexed corpus.

BIT09PH

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: BIT09PH
  • Participant: BIT
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: feed
  • MD5: 2aa506357a64c51094b9dca691a397f9
  • Run description: BIT09PH uses wikipedia corpus to get relevant feedback documents for query model, and use an opinionated lexicon to generate query model for opinionated faceted blog distillation. We also use metasearch engine to get personal blogs from blogs.myspace.com to generate query model for personal faceted blog distillation. Results come from permalinks and homepages indexed corpus. We use Global Representation model, which all posts from the same feed are combined into one document when compute the relevance score.

combined

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: combined
  • Participant: USI
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: feed
  • MD5: 8067c71b01d25d510d657d8556af3465
  • Run description: In this run, we combine the two other runs (OWA and Regularization) with a linear combination. It takes into account both similarity of the posts to other retrieved posts (by regularization) and good aggregation over the posts (by OWA) and generate new score for each feed.

FEUPirlab1

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: FEUPirlab1
  • Participant: FEUP
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: f2b1b34fcbca9140be0ecf1b96cdcdf9
  • Run description: Run combining BM25 scores form a post search.

FEUPirlab2

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: FEUPirlab2
  • Participant: FEUP
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 2975bc566d4bc6c4e2483212ac82953d
  • Run description: Run combining BM25 scores form a post search, including a boost for posts with invalid dates.

FEUPirlab3

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: FEUPirlab3
  • Participant: FEUP
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 71bfa7e3b9b486e4eb25306c3002399a
  • Run description: Run combining BM25 scores form a post search, including a boost for posts with invalid dates and a boost for newer posts.

FEUPirlab4

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: FEUPirlab4
  • Participant: FEUP
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 4cbfc8ffcd1c479bb7a933ef794e8f68
  • Run description: Run combining BM25 scores form a post search, including a boost for posts with invalid dates and a boost for older posts.

ICTNETBDRUN1

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: ICTNETBDRUN1
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: feed
  • MD5: 280e25cc3fe00b6380c1b88988a2f074
  • Run description: For each feed, we only use information of top 10000 ad-hoc relevant posts to decide the facet value.

ICTNETBDRUN2

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: ICTNETBDRUN2
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: feed
  • MD5: 1f668943a4ceebf2aa410ca876927529
  • Run description: For each feed, we only use information of top 10000 ad-hoc relevant posts to decide the facet value.

ICTNETBDRUN3

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: ICTNETBDRUN3
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: feed
  • MD5: 7eb2a28013af4a86c62fbec09d845f1a
  • Run description: for each feed ,we divide corresponding posts according to day they are crawled, and take into account the relevence distribution among days for feed relevece jugement.

ICTNETBDRUN4

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: ICTNETBDRUN4
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 1982689bf4a5088516d1c84d708447ae
  • Run description: this run is similar to ICTNETBDRUN3 except some strategy details.

ICTNETTSRun1

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ICTNETTSRun1
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: f165bee14cc83993f4bea81aea566898
  • Run description: It's a straightforward method using sum of BM25 scores with posts in given day to measure the improtance of headlines.

ICTNETTSRun2

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ICTNETTSRun2
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: aaa5f5f0765ec95a9f0efab74ef24c92
  • Run description: It's a straightforward method using sum of BM25 scores with posts in given day to measure the improtance of headlines.

ICTNETTSRun3

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ICTNETTSRun3
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: 8fb2d1eaafae6ce2e6be0361260fc8c6
  • Run description: It's a straightforward method using sum of BM25 scores with posts in given day to measure the improtance of headlines.

ICTNETTSRun4

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ICTNETTSRun4
  • Participant: ICTNET
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: ab76383512b0e9699ccedef83cca9f17
  • Run description: It's a straightforward method using sum of BM25 scores with posts in given day to measure the improtance of headlines.

IlpsBDm1T

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: IlpsBDm1T
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: c2448d23613083db59032a22b6cd20ca
  • Run description: Model1 on title-only index using queries expanded against a news collection and Wikipedia.

IlpsBDm2T

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: IlpsBDm2T
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: fc99c3bd89822c18cef20ee5b406b898
  • Run description: Model2 on title-only index using queries expanded against a news collection and Wikipedia.

IlpsBDmxfT

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: IlpsBDmxfT
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 8d6d8adeee70fba4203d69e4527bb1e4
  • Run description: Expanded queries (news and Wikipedia) on title-only index using model1 and model2, externally combined with 0.6 model1, 0.4 model2. Top 300 feeds used in combination: normalize scores, weight them, combine (CombMNZ). Afterwards indicators for facets are applied.

IlpsBDmxT

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: IlpsBDmxT
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 8f0f311e33e430060b4be727dbf6ce90
  • Run description: Expanded queries (news and Wikipedia) on title-only index using model1 and model2, externally combined with 0.6 model1, 0.4 model2. Top 300 feeds used in combination: for normalize scores, weight them, combine (CombMNZ).

IlpsTSExP

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IlpsTSExP
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 8/29/2009
  • Type: automatic
  • Task: topstories
  • MD5: 90a2f8b2ea251e7d61841f133d81aab2
  • Run description: extract keyterms from top 5000 blog posts, use as queries on headline index with date restrictions.

IlpsTSExT

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IlpsTSExT
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 8/29/2009
  • Type: automatic
  • Task: topstories
  • MD5: 6b12291eca5590615602c043de0f0090
  • Run description: extract keyterms from top 5000 blog post titles, use as queries on headline index with date restrictions.

IlpsTSHlP

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IlpsTSHlP
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: topstories
  • MD5: b8e232b2d2c36baec917d77cb548d421
  • Run description: Calculate headline likelihood on full post index of the given period. Use comments as association strength.

IlpsTSHlT

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IlpsTSHlT
  • Participant: UAms
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: topstories
  • MD5: b83e0348434b1cdb6280aaaf4c1a4c9c
  • Run description: Calculate headline likelihood on title index of the given period. Use comments as association strength.

IowaSBD0901

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: IowaSBD0901
  • Participant: IowaS
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: feed
  • MD5: 2c62780cf42fc2f79940b2aebf330242
  • Run description: The posts are indexed using Lucene engine. The queries are composed using the tag and up to 10 terms extracted from the tag using TFIDF. Two hundred posts are retrieved and are analyzed according to the querys facets. A lingpipe classifier is used for the opinionated/factual facet, the length of the posts us used for the in-depth/shallow facet, and the number of personal pronouns is used for the personal/official facet. External Resource: Lingpipe sentiment polarity training data

IowaSBD0902

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: IowaSBD0902
  • Participant: IowaS
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: feed
  • MD5: 653bdb878968ce3af3cf6e1c671f87b1
  • Run description: The posts are indexed using Lucene engine. The queries are composed using the tag and up to 10 terms extracted from the tag using TFIDF. Fifty posts are retrieved for pseudo-feedback. Using the latent Dirichlet relevance model (Ha-Thuc and Srinivasan), we obtain ten query expansion terms, which are added to the original query. Two hundred posts are then retrieved and analyzed according to the querys facets. A lingpipe classifier is used for the opinionated/factual facet, the length of the posts us used for the in-depth/shallow facet, and the number of personal pronouns is used for the personal/official facet. External resource: Lingpipe sentiment polarity training data

IowaSBT0901

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IowaSBT0901
  • Participant: IowaS
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: f5a3cc6fc65cc2c4bc11d0db00ea1e5d
  • Run description: Ranking headlines for a query date: Given a query date (d) we start with the URLs of the corresponding set of headlines. As directed these correspond to headlines published on d-1, d and d+1 dates. We search the blog dataset for posts that contain a headline URL. Headline URLs are then ranked by the number of posts returned. The top 100 are submitted for the query date. Ranking posts for a headline: The posts are indexed using Lucene. When a URL is found in a post, we extract the surrounding window of text. The size of this window is 800 characters before and after the URL (including HTML). Given a headline (h) these texts are accumulated from all posts containing the URL for h. We then extract keywords from these texts using our latent Dirichlet relevance model (Ha-Thuc and Srinivasan, A Latent Dirichlet Framework for Relevance Modeling, 5th Asia Information Retrieval Symposium, 2009). These keywords are used to query the blog dataset. The top 10 are returned as the set of posts to read for headline h.

IowaSBT0902

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IowaSBT0902
  • Participant: IowaS
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: 4852ace2dd62614f220e861d1fc19b67
  • Run description: Ranking headlines for a query date: Given a query date (d) we start with the URLs of the corresponding set of headlines. As directed these correspond to headlines published on d-1, d and d+1 dates. We search the blog dataset for posts that contain a headline URL. Headline URLs are then ranked by the number of posts returned. The top 100 are submitted for the query date. Ranking posts for a headline: The posts are indexed using Lucene. When a URL is found in a post, we extract the surrounding window of text. The size of this window is 800 characters before and after the URL (including HTML). Given a headline (h) these texts are accumulated from all posts containing the URL for h. We then extract keywords from these texts using our latent Dirichlet relevance model (Ha-Thuc and Srinivasan). These keywords are used to query the blog dataset. In the set of returned posts the ones containing the headline URL are boosted to the top. The top 10 are returned as the set of posts to read for headline h.

IowaSBT0903

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IowaSBT0903
  • Participant: IowaS
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: ea0a7e1e1130021785f53c14e971bc53
  • Run description: Ranking headlines for a query date: Given query date d, we rank headlines by an intensity measure indicating how frequently the queries are discussed in posts in 3-days (d-1), d, and (d+1). The intensity is computed by weighted sum over relevant posts, where the weights are the semantic similarity between keywords of the headline and content of the posts. The keywords for each headline are extracted from windows of text in the corpus surrounding the URL of the headline by our latent Dirichlet relevance model (Ha-Thuc and Srinivasan). Ranking posts for a headline: The posts are indexed using Lucene. When a URL is found in a post, we extract the surrounding window of text. The size of this window is 800 characters before and after the URL (including HTML). Given a headline (h) these texts are accumulated from all posts containing the URL for h. We then extract keywords from these texts using our latent Dirichlet relevance model (Ha-Thuc and Srinivasan). These keywords are used to query the blog dataset. In the set of returned posts the ones containing the headline URL are boosted to the top. The top 10 are returned as the set of posts to read for headline h.

IowaSBT0904

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: IowaSBT0904
  • Participant: IowaS
  • Track: Blog
  • Year: 2009
  • Submission: 8/27/2009
  • Type: automatic
  • Task: topstories
  • MD5: 04225cba78166df5663ac75c5bb63742
  • Run description: Ranking headlines for a query date: Given query date d, we rank headlines by posterior probability p(headline| posts), which indicates the importance of the headline in posts in the 3-day duration around d. The probability is computed by p(headline| posts) = p(headline)*p(posts| headline). Prior probability p(headline) is proportional the to number of posts containing URL of the headline. The likelihood is estimated by the similarity between the posts and the headline. Ranking posts for a headline: The posts are indexed using Lucene. When a URL is found in a post, we extract the surrounding window of text. The size of this window is 800 characters before and after the URL (including HTML). Given a headline (h) these texts are accumulated from all posts containing the URL for h. We then extract keywords from these texts using our latent Dirichlet relevance model (Ha-Thuc and Srinivasan). These keywords are used to query the blog dataset. In the set of returned posts the ones containing the headline URL are boosted to the top. The top 10 are returned as the set of posts to read for headline h.

KLEClusPrior

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: KLEClusPrior
  • Participant: POSTECH_KLE
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: topstories
  • MD5: 118c78605d8ad8f762212d1467a13617
  • Run description: The relevance between date query language model and headline language model. Cluster documents to estimate the headline language model. The time information and term weighting of each headline for the prior probability.

KLECluster

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: KLECluster
  • Participant: POSTECH_KLE
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: topstories
  • MD5: 8be291210ff30d14b49a4d57cd8cf9a6
  • Run description: The relevance between date query language model and headline language model. Cluster documents to estimate the headline language model.

KLEFeed

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: KLEFeed
  • Participant: POSTECH_KLE
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: topstories
  • MD5: 60605992f71125f9963eb1156dcb25ae
  • Run description: The relevance between date query language model and headline language model. Using Feed to estimate the headline language model.

KLEFeedPrior

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: KLEFeedPrior
  • Participant: POSTECH_KLE
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: topstories
  • MD5: 2599c797fc19822810054c1cc02201c9
  • Run description: The relevance between date query language model and headline language model. Using Feed to estimate the headline language model. The time information and term weighting of each headline for the prior probability.

nounfull

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: nounfull
  • Participant: knowcenter
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: 54c44fc086ddbd2f914c2bb9d391908c
  • Run description: We created a Lucene index (search index size 45 GB, approximately 800 000 successfully parsed feeds), and then searched in the Lucene index for the given query terms. We used 2500 blog posts to find the 100 most relevant blogs for the topics. After that, we vectorized the resulting 3800 feeds. For training, we manually labeled 83 blogs into the facet categories. For the faceted distillation, we classified the blogs into the facet categories using a Support Vector Machine based on LibLinear.

OWA

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: OWA
  • Participant: USI
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: feed
  • MD5: 6dabc2bdf23c9632786024ab95c19bf9
  • Run description: In this run, we used OWA (Ordered Weighted Average) operator to combine post relevance scores for a given feed.

pris

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: pris
  • Participant: buptpris___2009
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: feed
  • MD5: 76fbed8a63269223690e86e05e527199
  • Run description: We use indri in the relevance retrieval. Then the Stanford Toolkit is uesd to named entity reconition, ME is used to identify the sentiment orientation.The query expansion words are obtained by machine learning.

prisb

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: prisb
  • Participant: buptpris___2009
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: feed
  • MD5: e3048479f0b4232de5a602a44a13f938
  • Run description: We use indri in the relevance retrieval. Then the Stanford Toolkit is uesd to named entity reconition, ME is used to identify the sentiment orientation.

punctfull

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: punctfull
  • Participant: knowcenter
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: c5b2c55e776ae47162251121c35e8dd2
  • Run description: We created a Lucene index (search index size 45 GB, approximately 800 000 successfully parsed feeds), and then searched in the Lucene index for the given query terms. We used 2500 blog posts to find the 100 most relevant blogs for the topics. After that, we vectorized the resulting 3800 feeds. For training, we manually labeled 83 blogs into the facet categories. For the faceted distillation, we classified the blogs into the facet categories using a Support Vector Machine based on LibLinear.

RegLDM

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: RegLDM
  • Participant: USI
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: feed
  • MD5: 8a39203cc99ddfff1e5fd8c488793aed
  • Run description: We implemented a version of Large Document Model by concatenating top relevant posts in each blog to make a large document. Then we regularized blog relevance score based in it's similarity to other retrieved blogs.

regularized

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: regularized
  • Participant: USI
  • Track: Blog
  • Year: 2009
  • Submission: 8/28/2009
  • Type: automatic
  • Task: feed
  • MD5: 4f2882443bd0fcb232ef47527ad15bce
  • Run description: In this run, we do a score regularization step on post relevance scores and generate new scores for posts and then calculate feed score with simple averaging over posts score.

ri1025rw2b

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ri1025rw2b
  • Participant: shakwat
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: topstories
  • MD5: c8949784df3fe753658613259b324376
  • Run description: For each topic, Random indexing (Sahlgren 2006) was used to built a semantic space containing the blog posts and the headlines in a window around the date of the topic. This geometric representation of meaning of the episodes (posts and headlines) was then crawled using a Random-walk-like algorithm to find the closest posts for each headline. The ranking of the headlines take into account the number of steps needed to find n relevant posts for a headline, together with the density of posts around the headline, and also the average similarity between each headline and the associated posts. For each headline, the posts are ranked w.r.t their similarity with the headline.

ri1025rw5432

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ri1025rw5432
  • Participant: shakwat
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: topstories
  • MD5: 3a2a5b99841fbb2cd0da864269c1751c
  • Run description: For each topic, Random indexing (Sahlgren 2006) was used to built a semantic space containing the blog posts and the headlines in a window around the date of the topic. This geometric representation of meaning of the episodes (posts and headlines) was then crawled using a Random-walk-like algorithm to find the closest posts for each headline. The ranking of the headlines take into account the number of steps needed to find n relevant posts for a headline, together with the density of posts around the headline, and also the average similarity between each headline and the associated posts. For each headline, the posts are ranked w.r.t their similarity with the headline.

ri1025rw5h2b

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ri1025rw5h2b
  • Participant: shakwat
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: topstories
  • MD5: 6a626b421e9de25134ef1b21478eeea0
  • Run description: For each topic, Random indexing (Sahlgren 2006) was used to built a semantic space containing the blog posts and the headlines in a window around the date of the topic. This geometric representation of meaning of the episodes (posts and headlines) was then crawled using a Random-walk-like algorithm to find the closest posts for each headline. The ranking of the headlines take into account the number of steps needed to find n relevant posts for a headline, together with the density of posts around the headline, and also the average similarity between each headline and the associated posts. For each headline, the posts are ranked w.r.t their similarity with the headline.

ri2049rw3

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: ri2049rw3
  • Participant: shakwat
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: topstories
  • MD5: 7e1ad0b2ccb809b791c86f2ee44661dd
  • Run description: For each topic, Random indexing (Sahlgren 2006) was used to built a semantic space containing the blog posts and the headlines in a window around the date of the topic. This geometric representation of meaning of the episodes (posts and headlines) was then crawled using a Random-walk-like algorithm to find the closest posts for each headline. The ranking of the headlines take into account the number of steps needed to find n relevant posts for a headline, together with the density of posts around the headline, and also the average similarity between each headline and the associated posts. For each headline, the posts are ranked w.r.t their similarity with the headline.

runtag

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: runtag
  • Participant: USI
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: topstories
  • MD5: 326ce9079776b6b01d3eaf55922f9b2b
  • Run description: Time-aware clustering

sentence

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: sentence
  • Participant: knowcenter
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: ea285f71b79e3c0513be23c87e8b2641
  • Run description: We created a Lucene index (search index size 45 GB, approximately 800 000 successfully parsed feeds), and then searched in the Lucene index for the given query terms. We used 2500 blog posts to find the 100 most relevant blogs for the topics. After that, we vectorized the resulting 3800 feeds. For training, we manually labeled 83 blogs into the facet categories. For the faceted distillation, we classified the blogs into the facet categories using a Support Vector Machine based on LibLinear.

uogTrFBAlr

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: uogTrFBAlr
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: fc51046a8e16041905020bb2bb82f6c9
  • Run description: Parameter free DFR model, combined with the voting model and a learning to rank approach.

uogTrFBHlr

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: uogTrFBHlr
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: a9cf9dd527d32c569205ec51589a07a8
  • Run description: DFR parameter free model combining voting model, and some features for facets.

uogTrFBMclas

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: uogTrFBMclas
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: b99870ef2279c3faea09049a9cdf1fd6
  • Run description: Parameter free DFR model, combined with the voting model and several classifiers.

uogTrFBNclas

Results | Participants | Proceedings | Input | Summary (none) | Summary (first) | Summary (second) | Appendix

  • Run ID: uogTrFBNclas
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 9/1/2009
  • Type: automatic
  • Task: feed
  • MD5: f11f27aac97e1337c0c9b11df278847e
  • Run description: Parameter free DFR model, combined with the voting model and a classifier.

uogTrTSbmmr

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: uogTrTSbmmr
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: topstories
  • MD5: 986f5e3fb7bd58e6ad88e9cfd861b65d
  • Run description: Voting Model, Baseline, MMR.

uogTrTSemmrs

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: uogTrTSemmrs
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 8/30/2009
  • Type: automatic
  • Task: topstories
  • MD5: 2cdef5b78d866de4bfcbf20f7a2638d3
  • Run description: Voting Model, Enriched Headlines, MMR, Supporting Documents

uogTrTStimes

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: uogTrTStimes
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: topstories
  • MD5: d00e52b4b54c20177f70f9a77a4dde0c
  • Run description: Voting Model, Enriched Headlines, Time Reranking and Supporting Document Ranking

uogTrTSwtime

Results | Participants | Proceedings | Input | Summary (headline) | Summary (blogpost) | Appendix

  • Run ID: uogTrTSwtime
  • Participant: uogTr
  • Track: Blog
  • Year: 2009
  • Submission: 8/31/2009
  • Type: automatic
  • Task: topstories
  • MD5: 42c08f72d8236ab534abc0c054cd33ec
  • Run description: Voting Model, 4Day Boosting, Time Diversity