کامپیوتر و شبکه::
Cafarella Predecessor Nutch Subsequent version YARN or Hadoop 2.0 Hadoop written language Java Philosophy of computation Divide and conquer for large datasets Principle of computational processing Bring computer to data rather than bring data to computer System A distributed programming framework Main characteristics Accessible, robust, scalable, simple, and fault tolerant Storage-Hadoop distributed file system (HDFS) Self-healing distributed and shared storage element Initial computational program - MapReduce Distributed, aggregated, and collaborated parallel processing MapReduce library written language C++ code Process type Batch Hardware type Heterogeneous commodity hardware Software license Open source Initial applications IR and searching index and web crawler Solution type Software solution, not hardware solution Scalability solution Scale-out, not scale-up Typical size of data set From a few GBs to a few TBs Capable size of data set From tens of TBs to a few PBs Simple coherency model Write once and read many Default replication factor 3 Typical size of data block for HDFS 64 MB Permission model Relaxing POSIXa model Main application modules Mahout, Hive, Pig, HBase, Sqoop, Flume, Chukwa, Pentaho ...
According to Hatcher and Gospodnetic , Lucene is a high-performance scalable information retrieval (IR) library.
At the heart of Lucene IR library is its searching and indexing capability.
Like Apache Lucene, Solr was not an executable search engine rather than a toolkit or IR library .
As shown in Fig. 15, although both Lucene and Solr had adopted many different techniques for index searching, text mining, and IR algorithms, they can be generalized as classification algorithms.
واژگان شبکه مترجمین ایران