<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: 5 Tips for Sphinx Indexing</title>
	<atom:link href="http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/</link>
	<description></description>
	<lastBuildDate>Wed, 08 Feb 2012 05:24:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Avrohom Katz</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15981</link>
		<dc:creator>Avrohom Katz</dc:creator>
		<pubDate>Sat, 29 Aug 2009 19:38:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15981</guid>
		<description>The current recommendation is to run the search daemon on each machine running your application, not for every rails process. But yes each machine would hold its own index. The daemon does not require much resources to run, its really the indexing process that can tax the database. This setup offers the ability to scale as you add instances.  
As I mentioned in the post the delayed delta method does look promising for people who want more frequent reindexes however only really allows for one daemon running.  </description>
		<content:encoded><![CDATA[<p>The current recommendation is to run the search daemon on each machine running your application, not for every rails process. But yes each machine would hold its own index. The daemon does not require much resources to run, its really the indexing process that can tax the database. This setup offers the ability to scale as you add instances.<br />
As I mentioned in the post the delayed delta method does look promising for people who want more frequent reindexes however only really allows for one daemon running.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15971</link>
		<dc:creator>Tony</dc:creator>
		<pubDate>Sat, 29 Aug 2009 18:05:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15971</guid>
		<description>One thing I am unclear about is how to test thinking_sphinx.  I am figuring I need to stop my development mode searchd, start my test mode searchd, then add add some data and run a search (assuming I have delta indexing enabled...if not i guess i would have to re-index before running the search after adding the data).  Of course end the test with stopping the test searchd.  Maybe you only need start and stop if you have searchd running on the same port for dev and test.   I haven&#039;t gotten this working but if anyone has any testing tips, I would love to know. 
 
At some point in development mode, an &quot;index&quot; directory was created in my rails root with stuff like &quot;index/development/model_name/segments_8&quot; and I&#039;m not sure yet if that is sphinx related.  I do know that when testing, I see nothing like &quot;index/test/model_name/...&quot;.   </description>
		<content:encoded><![CDATA[<p>One thing I am unclear about is how to test thinking_sphinx.  I am figuring I need to stop my development mode searchd, start my test mode searchd, then add add some data and run a search (assuming I have delta indexing enabled&#8230;if not i guess i would have to re-index before running the search after adding the data).  Of course end the test with stopping the test searchd.  Maybe you only need start and stop if you have searchd running on the same port for dev and test.   I haven&#039;t gotten this working but if anyone has any testing tips, I would love to know. </p>
<p>At some point in development mode, an &quot;index&quot; directory was created in my rails root with stuff like &quot;index/development/model_name/segments_8&quot; and I&#039;m not sure yet if that is sphinx related.  I do know that when testing, I see nothing like &quot;index/test/model_name/&#8230;&quot;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Derrick</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15861</link>
		<dc:creator>Derrick</dc:creator>
		<pubDate>Fri, 28 Aug 2009 18:35:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15861</guid>
		<description>Did I read it correctly that you advocate a searchd instance for every rails process?  Each of those searchd instances would have their own index, right?  Is that a drain on resources; or is this perhaps an engineering tradeoff of resources for stability (which I suppose is a fair tradeoff in the cloud)? </description>
		<content:encoded><![CDATA[<p>Did I read it correctly that you advocate a searchd instance for every rails process?  Each of those searchd instances would have their own index, right?  Is that a drain on resources; or is this perhaps an engineering tradeoff of resources for stability (which I suppose is a fair tradeoff in the cloud)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leah Silber</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15855</link>
		<dc:creator>Leah Silber</dc:creator>
		<pubDate>Fri, 28 Aug 2009 16:57:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15855</guid>
		<description>@sintaxi: Not quite sure what the most *current* state of affairs is, to be honest... Looking into it :) </description>
		<content:encoded><![CDATA[<p>@sintaxi: Not quite sure what the most *current* state of affairs is, to be honest&#8230; Looking into it :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leah Silber</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15853</link>
		<dc:creator>Leah Silber</dc:creator>
		<pubDate>Fri, 28 Aug 2009 16:54:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15853</guid>
		<description>Thanks for the link Tim! </description>
		<content:encoded><![CDATA[<p>Thanks for the link Tim!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evan</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15740</link>
		<dc:creator>Evan</dc:creator>
		<pubDate>Thu, 27 Aug 2009 16:55:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15740</guid>
		<description>Thanks for this article.  
 
Thinking Sphinx used to not support faceting, time deltas, or excerpting, but now does; those were the &quot;enterprise&quot; features that kept people on Ultrasphinx. Since the addition of those to Thinking Sphinx, Ultrasphinx is no longer being developed, so the upgrade instructions are especially useful.  
 
The basic issue with string sorting in deltas is that you need to map the strings into an absolute space, whereas str2ordinal maps it into a relative space. So the correct algorithm needs to be as so: 
  * define a bigint Sphinx field for each string field you need to sort 
  * downcase and strip your strings of symbols 
  * map them into the lower ascii space (0 to 35), which is 5 bits per character 
  * bitpack that into a bigint, truncating the extra (this gives us up to twelve sort characters per string) 
  * store that bigint in the additional Sphinx field at index time 
 
Now your client can request sorted results from the additional field, with a little overscan, and post-sort to achieve strict order within the group. Of course this will break down if all your strings start with the same 12 characters.  
 
Ideally Sphinx would let you declare a sortable characters limit and store the truncated portion of the string internally. 
 
 
 </description>
		<content:encoded><![CDATA[<p>Thanks for this article.  </p>
<p>Thinking Sphinx used to not support faceting, time deltas, or excerpting, but now does; those were the &quot;enterprise&quot; features that kept people on Ultrasphinx. Since the addition of those to Thinking Sphinx, Ultrasphinx is no longer being developed, so the upgrade instructions are especially useful.  </p>
<p>The basic issue with string sorting in deltas is that you need to map the strings into an absolute space, whereas str2ordinal maps it into a relative space. So the correct algorithm needs to be as so:<br />
  * define a bigint Sphinx field for each string field you need to sort<br />
  * downcase and strip your strings of symbols<br />
  * map them into the lower ascii space (0 to 35), which is 5 bits per character<br />
  * bitpack that into a bigint, truncating the extra (this gives us up to twelve sort characters per string)<br />
  * store that bigint in the additional Sphinx field at index time </p>
<p>Now your client can request sorted results from the additional field, with a little overscan, and post-sort to achieve strict order within the group. Of course this will break down if all your strings start with the same 12 characters.  </p>
<p>Ideally Sphinx would let you declare a sortable characters limit and store the truncated portion of the string internally.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Peat</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15719</link>
		<dc:creator>Tim Peat</dc:creator>
		<pubDate>Thu, 27 Aug 2009 10:06:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15719</guid>
		<description>I can confirm that thinking sphinx DOES SUPPORT FACETTING; am using it at the moment. 
&lt;a href=&quot;http://freelancing-god.github.com/ts/en/facets.htm.&quot; target=&quot;_blank&quot;&gt;http://freelancing-god.github.com/ts/en/facets.ht...&lt;/a&gt; 
 
Thanks for the article. </description>
		<content:encoded><![CDATA[<p>I can confirm that thinking sphinx DOES SUPPORT FACETTING; am using it at the moment.<br />
<a href="http://freelancing-god.github.com/ts/en/facets.htm." target="_blank">http://freelancing-god.github.com/ts/en/facets.ht&#8230;</a> </p>
<p>Thanks for the article.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avrohom Katz</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15676</link>
		<dc:creator>Avrohom Katz</dc:creator>
		<pubDate>Wed, 26 Aug 2009 21:27:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15676</guid>
		<description>@caa: I&#039;m pretty sure ThinkingSphinx currently support facets, if that&#039;s what&#039;s holding you back  Take a peek at the current documentation for details: &lt;a href=&quot;http://freelancing-god.github.com/ts/en/facets.html&quot; target=&quot;_blank&quot;&gt;http://freelancing-god.github.com/ts/en/facets.ht...&lt;/a&gt; </description>
		<content:encoded><![CDATA[<p>@caa: I&#039;m pretty sure ThinkingSphinx currently support facets, if that&#039;s what&#039;s holding you back  Take a peek at the current documentation for details: <a href="http://freelancing-god.github.com/ts/en/facets.html" target="_blank">http://freelancing-god.github.com/ts/en/facets.ht&#8230;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leah Silber</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15673</link>
		<dc:creator>Leah Silber</dc:creator>
		<pubDate>Wed, 26 Aug 2009 21:06:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15673</guid>
		<description>Hey Josh -- 
 
The post has been updated; thanks for taking the time! </description>
		<content:encoded><![CDATA[<p>Hey Josh &#8212; </p>
<p>The post has been updated; thanks for taking the time!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avrohom Katz</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/comment-page-1/#comment-15672</link>
		<dc:creator>Avrohom Katz</dc:creator>
		<pubDate>Wed, 26 Aug 2009 21:04:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965#comment-15672</guid>
		<description>@court3nay: You are correct. This seems to be a limit in sphinx itself. Pat Allen mentions this in the Google group here: &lt;a href=&quot;http://groups.google.com/group/thinking-sphinx/browse_thread/thread/05d0adcd875c4222&quot; target=&quot;_blank&quot;&gt;http://groups.google.com/group/thinking-sphinx/br...&lt;/a&gt; &quot;that&#039;s a limitation in Sphinx. It calculates the ordinal values separately between indexes.&quot; There is really only one response with a possible solution which only works if you&#039;re not paginating search results.  
 
Thanks you for bringing that point up as I had not touched upon that in the post! </description>
		<content:encoded><![CDATA[<p>@court3nay: You are correct. This seems to be a limit in sphinx itself. Pat Allen mentions this in the Google group here: <a href="http://groups.google.com/group/thinking-sphinx/browse_thread/thread/05d0adcd875c4222" target="_blank">http://groups.google.com/group/thinking-sphinx/br&#8230;</a> &quot;that&#039;s a limitation in Sphinx. It calculates the ordinal values separately between indexes.&quot; There is really only one response with a possible solution which only works if you&#039;re not paginating search results.  </p>
<p>Thanks you for bringing that point up as I had not touched upon that in the post!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

