我試圖構建 MTree 類的物件(https://github.com/Waikato/moa/blob/master/moa/src/main/java/moa/clusterers/outliers/utils/mtree/MTree.java)
MTree 的建構式如下所示:
public MTree(DistanceFunction<? super DATA> distanceFunction,
SplitFunction<DATA> splitFunction) {
this(DEFAULT_MIN_NODE_CAPACITY, distanceFunction, splitFunction);
}
這里的DistanceFunction是一個介面,它的代碼是:
/**
* An object that can calculate the distance between two data objects.
*
* @param <DATA> The type of the data objects.
*/
public interface DistanceFunction<DATA> {
double calculate(DATA data1, DATA data2);
}
它的實作是:
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* Some pre-defined implementations of {@linkplain DistanceFunction distance
* functions}.
*/
public final class DistanceFunctions {
/**
* Don't let anyone instantiate this class.
*/
private DistanceFunctions() {}
/**
* Creates a cached version of a {@linkplain DistanceFunction distance
* function}. This method is used internally by {@link MTree} to create
* a cached distance function to pass to the {@linkplain SplitFunction split
* function}.
* @param distanceFunction The distance function to create a cached version
* of.
* @return The cached distance function.
*/
public static <Data> DistanceFunction<Data> cached(final DistanceFunction<Data> distanceFunction) {
return new DistanceFunction<Data>() {
class Pair {
Data data1;
Data data2;
public Pair(Data data1, Data data2) {
this.data1 = data1;
this.data2 = data2;
}
@Override
public int hashCode() {
return data1.hashCode() ^ data2.hashCode();
}
@Override
public boolean equals(Object arg0) {
if(arg0 instanceof Pair) {
Pair that = (Pair) arg0;
return this.data1.equals(that.data1)
&& this.data2.equals(that.data2);
} else {
return false;
}
}
}
private final Map<Pair, Double> cache = new HashMap<Pair, Double>();
@Override
public double calculate(Data data1, Data data2) {
Pair pair1 = new Pair(data1, data2);
Double distance = cache.get(pair1);
if(distance != null) {
return distance;
}
Pair pair2 = new Pair(data2, data1);
distance = cache.get(pair2);
if(distance != null) {
return distance;
}
distance = distanceFunction.calculate(data1, data2);
cache.put(pair1, distance);
cache.put(pair2, distance);
return distance;
}
};
}
/**
* An interface to represent coordinates in Euclidean spaces.
* @see <a href="http://en.wikipedia.org/wiki/Euclidean_space">"Euclidean
* Space" article at Wikipedia</a>
*/
public interface EuclideanCoordinate {
/**
* The number of dimensions.
*/
int dimensions();
/**
* A method to access the {@code index}-th component of the coordinate.
*
* @param index The index of the component. Must be less than {@link
* #dimensions()}.
*/
double get(int index);
}
/**
* Calculates the distance between two {@linkplain EuclideanCoordinate
* euclidean coordinates}.
*/
public static double euclidean(EuclideanCoordinate coord1, EuclideanCoordinate coord2) {
int size = Math.min(coord1.dimensions(), coord2.dimensions());
double distance = 0;
for(int i = 0; i < size; i ) {
double diff = coord1.get(i) - coord2.get(i);
distance = diff * diff;
}
distance = Math.sqrt(distance);
return distance;
}
/**
* A {@linkplain DistanceFunction distance function} object that calculates
* the distance between two {@linkplain EuclideanCoordinate euclidean
* coordinates}.
*/
public static final DistanceFunction<EuclideanCoordinate> EUCLIDEAN = new DistanceFunction<DistanceFunctions.EuclideanCoordinate>() {
@Override
public double calculate(EuclideanCoordinate coord1, EuclideanCoordinate coord2) {
return DistanceFunctions.euclidean(coord1, coord2);
}
};
/**
* A {@linkplain DistanceFunction distance function} object that calculates
* the distance between two coordinates represented by {@linkplain
* java.util.List lists} of {@link java.lang.Integer}s.
*/
public static final DistanceFunction<List<Integer>> EUCLIDEAN_INTEGER_LIST = new DistanceFunction<List<Integer>>() {
@Override
public double calculate(List<Integer> data1, List<Integer> data2) {
class IntegerListEuclideanCoordinate implements EuclideanCoordinate {
List<Integer> list;
public IntegerListEuclideanCoordinate(List<Integer> list) { this.list = list; }
@Override public int dimensions() { return list.size(); }
@Override public double get(int index) { return list.get(index); }
};
IntegerListEuclideanCoordinate coord1 = new IntegerListEuclideanCoordinate(data1);
IntegerListEuclideanCoordinate coord2 = new IntegerListEuclideanCoordinate(data2);
return DistanceFunctions.euclidean(coord1, coord2);
}
};
/**
* A {@linkplain DistanceFunction distance function} object that calculates
* the distance between two coordinates represented by {@linkplain
* java.util.List lists} of {@link java.lang.Double}s.
*/
public static final DistanceFunction<List<Double>> EUCLIDEAN_DOUBLE_LIST = new DistanceFunction<List<Double>>() {
@Override
public double calculate(List<Double> data1, List<Double> data2) {
class DoubleListEuclideanCoordinate implements EuclideanCoordinate {
List<Double> list;
public DoubleListEuclideanCoordinate(List<Double> list) { this.list = list; }
@Override public int dimensions() { return list.size(); }
@Override public double get(int index) { return list.get(index); }
};
DoubleListEuclideanCoordinate coord1 = new DoubleListEuclideanCoordinate(data1);
DoubleListEuclideanCoordinate coord2 = new DoubleListEuclideanCoordinate(data2);
return DistanceFunctions.euclidean(coord1, coord2);
}
};
}
我的第一個問題是return new DistanceFunction<Data>()方法中 的含義是什么public static <Data> DistanceFunction<Data> cached(final DistanceFunction<Data> distanceFunction)[方法在類DistanceFunctions中]我只是Java的初學者,這個對我來說有點難以理解。
Also, to create an object of MTree, I should create an object of DistanceFunctions and an object of ComposedSplitFunction(Which is the implementation of SplitFunction interface) and input them as parameter for MTree constructor. But I really don't know how to do that because in DistanceFunctions class, the constructor is private. So I cannot generate a parameter for the constructor of MTree. What should I do?
New Update: What I want to do is create a Junit Test for MTree, and I believe the first thing I need to do is create an object of MTree.
uj5u.com熱心網友回復:
介面可以有多個實作。它們只是形成需要遵循的一般合同實施。
cache此處的實作 ie 將 aDistanceFunction作為輸入并保證 A 和 B(或 B 和 A)之間的距離值僅計算一次,然后從內部cache地圖提供。該cache函式的泛型型別只是保證您可以將任何型別傳遞給它。即你可以有一個實作,它采用最簡單的形式,只有兩個整數,并像這樣計算它們的差:
DistanceFunction<Integer> func = (Integer a, Integer b) -> Math.abs(a - b);
這是一個labmda運算式,也可以像這樣寫得更冗長
DistanceFunction<Integer> func = new DistanceFunction<Integer>() {
@Override
public double calculate(Integer data1, Integer data2) {
return Math.abs(data1 - data2);
}
};
然后像這樣使用它來快取提供的輸入引數的回傳值:
DistanceFunction<Integer> cache = DistanceFunctions.cached(func);
double distance = cache.calculate(10, 5);
如果你以后有這樣的電話
distance = cache.calculate(10, 5);
再次或什至
distance = cache.calculate(5, 10);
上述情況下的距離值不會重新計算,而是從內部cache地圖回傳其值,因為之前已經計算了這些引數的距離。如果您有大量資料點,但這些資料點的組合數量有限,并且計算成本相當高,這將特別有用。
如果您進一步查看DistanceFunctions您提供的類,您會發現它已經為 ie 提供了一些實作EUCLIDEAN,EUCLIDEAN_INTEGER_LIST并且EUCLIDEAN_DOUBLE_LIST由于它們的靜態最終性質可以直接在代碼中用作常量。在這里,您只需要calculate(...)根據您選擇的實作為方法提供匹配的輸入引數。
關于懷卡托的MTree` 初始化,一個粗略的模板可能如下所示:
MTree mTree = new MTree(EUCLIDEAN_INTEGER_LIST, new SplitFunction<List<Integer>>(...) {
...
@Override
public SplitResult<List<Integer>> process(Set<List<Integer>> dataSet, DistanceFunction<? super List<Integer>> distanceFunction) {
Pair<List<Integer>> promoted = ...
Pair<Set<List<Integer>>> partitions = ...
return new SplitResult<List<Integer>>(promoted, partitions);
}
});
其中概述的部分...需要由您定義和實施。該包中的代碼雖然已經提供了一個ComposedSplitFunction實作,它需要PartitionFunction和PromotionFunction作為輸入,其中這些實作已經在PartitionFunctions和PromotionFunctions類中可用,它們的作業方式DistanceFunction與DistanceFunctions此處討論的和相同。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/342896.html
